Modify 1D and 2D decode visualizations to use a predefined set of position bins rather than inferring from data

edeno commented 1 year ago

It would be nice to have the valid on track positions be discovered from the Environment class in my code (or rather, can be pre-specified by an array) rather than inferred from the positions given.

The reason for this is that sometimes the dataset for the decoding model will be smaller than the dataset for the encoding model, so not all the valid positions will exist in the data.

jsoules commented 1 year ago

Hi Eric, Do you have an example dataset where this is the case? Just looking for an instance to work against. Thanks!

edeno commented 1 year ago

I think you could take any dataset you have and just take the first 100 time bins. In this case, the full path the animal has traversed will not be displayed (because it will be inferred from the first 100 time bins only).

jsoules commented 1 year ago

Here's the current state of 2d track animation processing. (Haven't yet dug into the situation for 1d.)

Front-end: The relevant components live in this repository, as test-gui/src/package/view-track-position-animation/*.

Track geometry is actually already passed in to the front-end from the data (as elements of data, a TrackAnimationStaticData object which includes fields describing the width, height, and upper-left corner of each bin making up the track).
At present, I believe anything appearing in this array should be displayed on the track, and no changes will be required for the front-end code for 2d (though we'll see when it comes to testing).
Instead, we just need to make sure the correct/complete track geometry is getting written into the data source file.

Back-end: It appears that the processing code is no longer maintained on our side: I can't find the preprocessing scripts that would turn position/decoding source data into a TrackAnimationStaticData JSON object in FI-side repositories. Instead it looks like this now lives within spyglass, specifically LorenFrankLab/spyglass/src/spyglass/decoding/visualization.py and .../decoding/visualization_2D_view.py.

The main entry point would be the create_interactive_2D_decoding_figurl() function in visualization.py. This is currently parameterized by:
- position_info, a pandas dataframe of position information (including the time and observed positions)
- results, an xarray Dataset holding the posterior distribution per timepoint,
- bin_size, a measure of the size of each bin in real-world/on-track units (e.g. 1.0 cm),
- and some other values which don't matter for the functionality under question.
Generating the 2d animation is entirely delegated to the create_2D_decode_view function in visualization_2D_view.py.
Within visualization_2D_view.py, there's a couple of layers of processing.
- As far as track geometry goes, this is handled by the make_track function, which is largely delegating to the get_grid and get_track_interior functions imported from replay_trajectory_classification.environments, the same code that defines the Environment class (upstream here).
- The create_static_track_animation() function does minimal data reformatting and then creates the track animation object (used to display track background and observed positions, as well as head position if included in the dataset).
- Most of the rest of the code is devoted to handling the decoded/posterior data.
- As far as I can tell from current examination, the issue hinges on the decoded-position data using a separate linearization function from the track-bins created by the Environment and its associated functions.
- My recollection is that this was ultimately driven by the possibility that the bins used for the decoding process might have a different size than the bins used to describe the track in the environment proper. I don't know if this concern is (still?) justified, though.

At present, it appears that the minimally invasive changes required would be to modify the processing code, in two areas:

Within the overall track processing, we are delegating to the inference functions from the Environment class, on the assumption that inference is required; we should take an additional Environment parameter (or formatted ndarray etc) to just use that.
Within the decoded-position processing, we'll need to modify the process_decoded_data function to take an additional parameter representing the linearized positions from the observed-position geometry, and a linearization function that converts the decoded positions to the closest track bin. (Note that the x- and y-coordinates of the decoded position data come from the discrete index values that appear in the posterior xarray data set.)

For implementing the first part, I'll need additional input from @edeno on how the Environment class represents track geometries, as well as an example populated Environment for the Chimi 2020 data (I do have one pickled, but it doesn't have track geometry populated already, so if we can run through the appropriate steps to populate that should be sufficient).

For implementing the second part, I'll need to confirm some assumptions and then see how good the match is between my example decoded data and observed data; we may need to have a conversation about what to do in the event the scales don't match or the given decoded bins don't align readily with the observed-position bins.

This is the minimally-invasive solution; as best as I can tell, the front-end code does represent (potentially separate) sets of position buckets for decode vs actual track position. If we are certain that these can never differ in size or location, we could refactor to avoid this possibility; it would simplify the ultimate implementation, but would remove flexibility for the future and would mean a larger set of changes (which would be also breaking for currently-existing figurls).

jsoules commented 1 year ago

2D case addressed by Spyglass PR #642 (https://github.com/LorenFrankLab/spyglass/pull/642)

jsoules commented 1 year ago

Talked with Eric--I don't think there actually is a parallel issue for the 1D case, so I think this issue is resolved by Spyglass PR #642 (https://github.com/LorenFrankLab/spyglass/pull/642)

magland / figurl-franklab-views

Modify 1D and 2D decode visualizations to use a predefined set of position bins rather than inferring from data #19