Open emanuele opened 4 years ago
if we previously save the tractogram as a numpy array (numpy.save()) and then load it with numpy.load(). But in this case, the streamlines must be resampled to a fixed number of points.
The following blog post has a suggestion for how to store ragged arrays (aka jagged arrays), that should fit the case where not all streamlines have the same number of points. It basically would involve concatenating all streamlines along the dimension that varies in length and recording the offsets to each streamline in a separate index list. These two arrays (concatenated and offsets) can then be stored in a single .npz
file via np.savez
. There is a concrete example in the following blog post:
https://tonysyu.github.io/ragged-arrays.html#.XyRsJBF7kr4
If the overall memory use becomes a concern, it would be possible to use xarray + Dask to allow indexing into the concatenated array without having to load the whole thing in memory at once.
We currently have the ArrayProxy
construct, which just records the offset, dtype and shape, and will pull data when requested. By a similar principle we could have an ArrayProxySequence
which just pins multiple ArrayProxy
s to different offsets in the file, or an ArraySequenceProxy
, which indexes the entire file and acts like an ArraySequence
.
I am also seeing a very slow streamline loading performance for tck as well. It looks like this part of the code is severely CPU bounded
https://github.com/nipy/nibabel/blob/master/nibabel/streamlines/array_sequence.py#L32
Since I was requested a couple of times, I added the link to download the test file/tractogram in the github repo of our fast TRK loader (https://github.com/emanuele/load_trk.git): https://nilab.cimec.unitn.it/people/olivetti/data/sub-100206_var-FNAL_tract.trk . It may be useful for benchmarking. And here is another, much larger, test tractogram (10M streamlines): https://nilab.cimec.unitn.it/people/olivetti/data/sub-599469_var-10M_tract.trk
Finally got some time around to finish my testing. I can't reproduce the same timing as you originally posted @emanuele. Here's the simple benchmark script I'm using:
import nibabel as nib
# filename = 'sub-100206_var-FNAL_tract.trk'
filename = 'sub-599469_var-10M_tract.trk'
S = nib.streamlines.load(filename)
print("Nb. of streamlines:", len(S.streamlines))
print("Checksum:", S.streamlines._data.sum())
Here's my timing for the 10M streamlines file you shared (4:25 min, RAM peak: 3.41 Gb):
/usr/bin/time -v python bench_trk.py
Nb. of streamlines: 10000000
Checksum: -1326157800.0
User time (seconds): 260.52
System time (seconds): 4.31
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:25.10
Maximum resident set size (kbytes): 3576648
And with my new patch (see #1000), I get (1:40 min, RAM peak: 5.72 Gb):
/usr/bin/time -v python bench_trk.py
Nb. of streamlines: 10000000
Checksum: -1326157800.0
User time (seconds): 99.87
System time (seconds): 3.56
Percent of CPU this job got: 102%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:40.66
Maximum resident set size (kbytes): 6001400
I'm not sure why the RAM peak is higher with the patch since it is supposed to do the affine transformation in-place :-/. However, we can easily chunk it to avoid RAM usage surges.
I do like like your two-pass approach that first maps out the streamlines, then loads only the streamlines needed. I could see this be integrated into the LazyTractogram
.
@frheault is there any way that @emanuele's optimization could be made into a pull request for nibabel. I know there is work on a new format, but TRK remains popular. Beyond performance, the code appears to also handle the TRK format better.
@neurolabusc @emanuele's code relies on NiBabel to read the TRK's header. Regarding speed, with PR #1000, it should be comparable.
@MarcCote can I suggest you contact Steven Jay Granger who reported that he was unable to load a TRK file created by DSI studio using DiPy, though he was able to load it with @emanuele's patch. Even if the latest release has improved speed, providing full compatibility with this popular format is important.
@neurolabusc yes, I replied on the Discourse thread.
At least for tractograms in TRK (Trackvis) format, in the case of a large number of streamlines (>1 million), loading streamlines with dipy.io.streamline.load_tractogram() is extremely slow and requires an unusual amount of memory. Example: for a tractogram of 10 million streamlines, loading takes: