magland / figurl-franklab-views

0 stars 0 forks source link

Implements a static view of the 1d decode as a linear-position-vs-time plot. #5

Closed jsoules closed 1 year ago

jsoules commented 1 year ago

This PR comprises code to transmit, represent, downsample, and draw linearized (1d) decoded position data, with an optional overlay for observed animal position, on a 2d linearized-position-vs-time plot (DecodedLinearPositionPlotView).

A current version of this code running against a large data set (~55MB in compressed form, originally >4GB) can be seen at: https://www.figurl.org/f?v=gs://figurl/franklab-views-dev1f&d=sha1://aa4ce75d420bdedb3e74a3f70378f7747bafa14e&label=03f1-causal-posterior&s={}

The base data files for this recording can be found at /mnt/home/jsoules/ceph/ceph-bridge/acomrie-1d-samples/; the specific data set visualized above is the causal posterior field for the "1 valid times_1D_03f1e...nc" file (and corresponding csv).

This version represents a lot of attempts to improve performance for large data source files. This has been successful enough to provide usable performance, but further work needs to be done. To that end, I've left in a lot of commented-out code, including alternate ways of doing things, todos, and timing code, that I would normally remove before making a PR; I expect it'll be useful in any ongoing work on this.

The main point of this exercise is to represent the data in a sufficiently compressed way that it can be manageably displayed by the browser, without losing too much resolution for the user. We can represent the data in compressed format because the part we care about is actually quite sparse. Unfortunately, if we were to represent these data as a native bitmap--including drawing in full resolution on a Canvas--all that compression would go away: it takes just as many bits to represent a 0 as any other value. Instead we have to do some workarounds.

The data are transmitted over the wire as three numpy arrays--representing the number of observations per time point, the values of each observation, and the corresponding track position of each observation. However, mapping this back into a 2D space for display is expensive (we have to iterate over several arrays in parallel, with varying step sizes). Because the model's predictions at each time point tend to have central peaks with slopes to either side, the data lend themselves to being represented as a set of overlapping lines or rectangles--so that a model prediction of e.g. "12 at point 50, 12 at point 51, 12 at point 52, 13 at point 53, 56 at point 54, 12 at point 55" can be represented as "12 from 50-55, 13 from 53-54, 56 at 54-54" -- where each "run" begins the first time its value appears and ends at the last consecutive position with an equal or greater value. Again, because the model tends to output fairly repetitive continuous predictions, this representation is in some cases even more compressed than the base three-array representation; and additionally, it maps nicely onto graphics drawing instructions.

Ideally we would be able to simply use this "overlapping-runs representation" as the base for an SVG, which could be scaled as needed to fit on the user's screen. Unfortunately, this hasn't worked out well in practice, largely because I haven't been able to figure out how to match an offscreen in-memory SVG into a Canvas element, and the base drawing components in FigURL (like TimeScrollView) use Canvas.

We cannot draw the data in full resolution, as discussed above, so we need to do downsampling (for large viewing windows) and windowing (for when the view window is small enough to show full resolution). Downsampling and representing the data as rectangles are expensive but not impossible operations, however rendering windows into the appropriately-sampled data turned out to be not very performant, so I have also added an offscreen canvas cache with a maximal width (~20k pixels or time points) that can be panned and updated. Additionally, updating this cache in response to user panning actions attempts to reuse as much of the offscreen canvas as possible by copying its contents forward to the new window.

It is possible we might be able to skip the step of the overlapping-rectangles representation in the current version, since re-representing the data is expensive, and we do ultimately convert to bitmap. However, it's not clear to me that we would be better off by skipping this step: the expense comes mostly from iterating over the three arrays; if we were drawing directly from this representation, we would probably incur that expense with every drawing operation, rather than once only when we convert to rectangles. Additionally, because there are varying numbers of observations at every time point, it would be problematic to try to do random access into this data to match the user's selected time window.


The main files of interest are in the view-decoded-linear-position-plot directory and are:

(The logic for this task has been split over several files to make it easier to track and manage.)

The DecodedLinearPositionPlotViewData provides only a data description and translation interface from the data received over the wire.

DecodedLinearPositionPlotView is an interface to the standard TimeScrollView component. It receives the data, computes some basic properties, then calls functions to downsample the decoded and observed positions as needed, set up the offscreen canvas (if it does not already exist), and ask the canvas to provide drawing locations for the data that should be displayed in the user's window.

DecodedLinearPositionRepresentations handles the conversion from the 3-arrays representation into the overlapping-lines representation for the data.

DecodedLinearPositionDownsampling handles downsampling for the decoded and observed positions (which is done by averaging over a downsampling range in the former case, and taking the median over the downsampling range in the latter case).

DecodedLinearPositionDrawing is the logic to render the rectangles representation onto the offscreen canvas, as well as tracking the contents of the offscreen canvas in terms of time units and communicating appropriate coordinates back to the main view.

For further details, see the commentary in the corresponding files.