Visualize tractography-generated streamlines in Neuroglancer

kabilar commented 6 months ago

Use case

We are looking to visualize tractography-generated streamlines from a Track File (.trk) in Neuroglancer. See Track File docs and an example Track File. We have a current implementation (see Current implementation section below), but are looking for advice on the optimal implementation (see Open questions section below). Thank you.

Requirements

Add support in Neuroglancer to natively read and visualize a public Track File (.trk). (To prevent data duplication, we would prefer not to save into another format for visualization.)
1. Transform the streamlines from voxel space to world space.
To improve rendering performance, add front-end component(s) to filter the streamlines.
1. Add a slider to subselect a random number of streamlines (with a fixed seed for reproducible states).
2. Add a slider to subselect the number of points per streamline.
Add a front-end component to optionally display a colormap overlay on the streamlines. The colormap reflects the orientations of the streamlines at each vertex.

Implementation details

Track File (.trk) documentation
Track Files store streamlines in voxel space. The vox_to_ras matrix is stored in the header and can convert the points from voxel space to world space.
A publicly available trk file can be found in Dandiset 000724. (Files can be much larger than this example.)
The nibabel Python library can read trk files.

Current implementation

My colleague (@balbasty) has created the ngtools Python package with the accompanying example notebook. The tract.py module performs the following operations:

Reads the streamlines into memory.
Selects a subset of streamlines (to improve performance).
For the subset of streamlines, creates concatenated arrays of the vertices, edges, and orientations. (Orientations are used for a colormap overlay.)
Converts the streamlines into a single, local skeleton layer by passing the above arrays into neuroglancer.skeleton.Skeleton(). A previous version of this package converted the streamlines into a precomputed skeleton format.

After discussion with @balbasty, a downside to this approach is that a user must locally load all the streamlines into memory and perform this (minor) compute operation to convert the streamlines into a format that can be visualized. Thus the state of the Neuroglancer viewer with a Track File cannot readily be passed between users.

Open questions

Is there a reference implementation for visualizing tractography-generated streamlines (from any file format) in Neuroglancer?
If there is not a reference implementation, what is the optimal way of displaying streamlines (Requirement 1)?
Would we need to display only a subset of streamlines to improve rendering performance at multiple scales (Requirement 2)?

cc @balbasty @aaronkanzer @mikeschutzman @satra @ayendiki

fcollman commented 6 months ago

To make this work as you desire you would need to add trk files as a new kind of data source to neuroglancer, which would mean writing a typescript module capable of reading trk files that conforms to the data source api.

this would then let you add layers with sources like trk://https://myseerver.com/data

relying on python means you need to provide a server with the data eliminating as you point out the share ability of the solution.

fcollman commented 6 months ago

Neuroglancer is used to display billions of objects, usually this is done by having a concept of segment IDs that represent individual objects and then ways of selecting subsets of them. I would suggest for very large numbers of streamlines that you make each line segment ID and attach sufficient metadata to them to make selecting the ones you want easy.

there is already one format that implants such a metadata loading system with tags and numerical properties and labels, that scales to probably the hundreds of thousands of segment ids. Doing this for more than that would likely require a different approach.

ayendiki commented 6 months ago

Hundreds of thousands would work, thanks @fcollman ! Even when a dataset has millions of streamlines, we typically display < 1% of them at a time, otherwise the display becomes too cluttered.

kabilar commented 6 months ago

Thank you, @fcollman. This is very helpful.

aaronkanzer commented 6 months ago

there is already one format that implants such a metadata loading system with tags and numerical properties and labels, that scales to probably the hundreds of thousands of segment ids. Doing this for more than that would likely require a different approach.

Thanks as well, @fcollman -- just had one follow-up question for clarification -- when you say "there is already one format that implants such a metadata loading system" -- what format are you referencing?

fcollman commented 6 months ago

The precomputed segmentation info formatneuroglancer/src/datasource/precomputed/segment_properties.md at master · google/neuroglancergithub.comWhich I think you might be able to add as an extra source on a layer with a different format as an interim solution, but better would be to implement the api that this source uses to display the properties you already have in your format. Sent from my mobile device.On Mar 22, 2024, at 7:28 AM, aaronkanzer @.***> wrote:

there is already one format that implants such a metadata loading system with tags and numerical properties and labels, that scales to probably the hundreds of thousands of segment ids. Doing this for more than that would likely require a different approach.

Thanks as well, @fcollman -- just had one follow-up question for clarification -- when you say "there is already one format that implants such a metadata loading system" -- what format are you referencing?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

balbasty commented 6 months ago

Thanks for the pointers @fcollman!

Our current implementation does use the precomputed skeleton format. Our implementation can choose between:

Converting our file format to a precomputed skeleton "virtually" and "on the fly" using a local fileserver. This allows our file to be displayed in any instance of neuroglancer, but we need to run a fileserver.
Converting our file format to a local Skeleton class and load it using the python API. This forces us to run our own instance of neuroglancer.

We did give a different ID to each streamline in a first instance, but this made the display way too slow (even on a local instance accessing a local file), since each streamline gets queried by its own GET request. This is why we ended up fusing all the streamlines (or rather, all the displayed streamlines) into a single skeleton.

Another issue with the skeleton format (I think) is that our streamlines typically range very long distances, so whatever the current field of view, there is allways 1-10k streamlines passing through it, and all of the must be queried and displayed. This is very different from the skeletons of cells in EM volumes, which seem to be much more localized. One solution could be to chunk our streamlines into smaller segments so that they don't have to be loaded entirely. But it doesn't solve anything at coarse resolutions, where all segments must be shown.

This is why we feel that having builtin routines for streamlines could be beneficial. And we think that our user would appreciate the ability to change the density of displayed streamlines at different resolutions.

fcollman commented 6 months ago

I think this points in the direction of developing a streamline/skeleton format that is similar in capability to the multi-resolution mesh format that is already in precomputed. Adding a skeleton format that has spatial indexing and multiple resolutions would be a straight forward project and benefit a lot of projects not just streamlines, but it runs afoul however of not rewriting the .trk file format, which from my initial reading a) no concept of a spatial index. b) no concept of multiple resolutions. It also has insufficient indexing to even read a subset of the streamlines effectively.

Without such indexes, you are pushing the computation of such an index and multiple resolutions to the typescript code within neuroglancer, which isn't the pattern that has generally been followed... though I think it probably could be done.

Most of neuroglancer is written with an underlying assumption that the data is likely too large to download all of it, and so datasources should ideally have indexing that enables efficient downloading of the appropriate parts of the data for a particular display setting.

On Fri, Mar 22, 2024 at 8:20 AM Yaël Balbastre @.***> wrote:

Thanks for the pointers @fcollman https://github.com/fcollman!

Our current implementation https://github.com/neuroscales/ngtools/blob/main/ngtools/tracts.py does use the precomputed skeleton format. Our implementation can choose between:

Converting our file format to a precomputed skeleton "virtually" and "on the fly" using a local fileserver. This allows our file to be displayed in any instance of neuroglancer, but we need to run a fileserver.

Converting our file format to a local Skeleton class and load it using the python API. This forces us to run our own instance of neuroglancer.

We did give a different ID to each streamline in a first instance, but this made the display way too slow (even on a local instance accessing a local file), since each streamline gets queried by its own GET request. This is why we ended up fusing all the streamlines (or rather, all the displayed streamlines) into a single skeleton.

Another issue with the skeleton format (I think) is that our streamlines typically range very long distances, so whatever the current field of view, there is allways 1-10k streamlines passing through it, and all of the must be queried and displayed. This is very different from the skeletons of cells in EM volumes, which seem to be much more localized. One solution could be to chunk our streamlines into smaller segments so that they don't have to be loaded entirely. But it doesn't solve anything at coarse resolutions, where all segments must be shown.

This is why we feel that having builtin routines for streamlines could be beneficial. And we think that our user would appreciate the ability to change the density of displayed streamlines at different resolutions.

— Reply to this email directly, view it on GitHub https://github.com/google/neuroglancer/issues/555#issuecomment-2015329180, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF7ABPLN5AQWPBSHQWVUDLYZRD27AVCNFSM6AAAAABE7VYHC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJVGMZDSMJYGA . You are receiving this because you were mentioned.Message ID: @.***>

jbms commented 6 months ago

As Forrest mentioned, to support this format natively in Neuroglancer you would need to implement a Neuroglancer datasource for it in JavaScript/TypeScript. Additionally, though, the datasource implementation is responsible for presenting the loaded data source as a collection of one or more "subsources" of various types --- the current subsource types include volumetric data, single and multi-resolution object meshes, segment properties, segment agglomeration graph, skeletons, annotations, and (not commonly used) individual meshes.

Of the existing subsource types, the only plausible ones are skeletons and annotations, plus segment properties could be used to indicate per-track properties if each track is represented as a separate segment id.

The annotation source type has the advantage of already supporting spatial indexing, unlike skeleton sources, and it also already does a form of subsampling when using the spatial index. You could see if it could be made to work. (Skeleton sources might ideally be unified with annotation sources in the future, and in general you can do everything with an annotation source that you can do with a skeleton source, except that a skeleton source can be displayed by a segmentation layer along with volumetric segmentation data, while an annotation source currently can only be displayed on an annotation layer.)

To represent the tracts as an annotation source, each pair of adjacent points would be added as a line annotation. Any per-point properties would be included, and any per-tract properties would also be duplicated to every line annotation, if you wish for those properties to be usable for rendering. You could also specify as a "related segment" the tract id.

When using the spatial index, subsampling is automatic. There is a UI control that determines the amount of subsampling. The subsampling is deterministic and basically is just selecting a stopping point along a particular ordering of the annotations. You can influence the subsampling by choosing the ordering when generating the spatial index, e.g. to approximately ensure that entire tracts are sampled together. However the annotation layer would not currently provide a good way to subsampling the points within a tract.

In principle a new source type could also be defined --- then it would also be necessary to implement rendering support for it and any necessary UI interactions. This would be rather costly both in terms of initial implementation effort and ongoing maintenance, though, so it would be much better to add any necessary functionality to annotation sources instead.

google / neuroglancer