scil-vital / TrackToLearn

Public release of Track-to-Learn: A general framework for tractography with deep reinforcement learning
GNU General Public License v3.0
18 stars 10 forks source link

tll_track.py KILLED #27

Closed mdesco closed 1 year ago

mdesco commented 1 year ago

Testing the script on 7T HCP data 1mm iso, the script crashes with KILLED message.

The environment is loaded and sh converted but then crashes... It seems like a memory issue...

braindata/processedData/max/track2learn/7T/100610 ttl_track.py FODF_Metrics/100610fodf.nii.gz Local_Seeding_Mask/100610local_seeding_mask.nii.gz Local_Tracking_Mask/100610__local_tracking_mask.nii.gz track2learn_npv10.trk --min_length 20 --max_length 200 --npv 10 --compress 0.1

AntoineTheb commented 1 year ago

Hmmm, sounds like the OOM killer. It might be a memory issue while converting the SH, I’ll try to run the tracking on this data to see.

AntoineTheb commented 1 year ago

I'm almost certain ttl_track.py gets killed because it takes too much memory when converting SH -> SF -> SH to go from order 8 to order 6. For example, my machine (with a puny 16GB of ram) straight up refuses to try it:

ttl_track.py FODF_Metrics/100610__fodf.nii.gz Local_Seeding_Mask/100610__local_seeding_mask.nii.gz Local_Tracking_Mask/100610__local_tracking_mask.nii.gz ~/workspace/TrackToLearn/max_test.trk --min_length 20 --max_length 200 --npv 10 --compress 0.1 -f --n_actor 50000
Loading environment.
SH coefficients are of order 8, converting them to order 6.
Traceback (most recent call last):
  File "/home/thea1603/workspace/TrackToLearn/.env/bin/ttl_track.py", line 11, in <module>
    load_entry_point('Track-to-Learn', 'console_scripts', 'ttl_track.py')()
  File "/home/thea1603/workspace/TrackToLearn/TrackToLearn/runners/ttl_track.py", line 348, in main
    experiment.run()
  File "/home/thea1603/workspace/TrackToLearn/TrackToLearn/runners/ttl_track.py", line 139, in run
    back_env, env = self.get_tracking_envs()
  File "/home/thea1603/workspace/TrackToLearn/TrackToLearn/experiment/ttl.py", line 248, in get_tracking_envs
    env = class_dict['tracker'].from_files(env_dto)
  File "/home/thea1603/workspace/TrackToLearn/TrackToLearn/environments/env.py", line 238, in from_files
    input_volume, tracking_mask, seeding_mask = BaseEnv._load_files(
  File "/home/thea1603/workspace/TrackToLearn/TrackToLearn/environments/env.py", line 317, in _load_files
    data = set_sh_order_basis(signal.get_fdata(dtype=np.float32),
  File "/home/thea1603/workspace/TrackToLearn/TrackToLearn/datasets/utils.py", line 194, in set_sh_order_basis
    sf = sh_to_sf(
  File "/home/thea1603/workspace/TrackToLearn/.env/lib/python3.8/site-packages/dipy/reconst/shm.py", line 1224, in sh_to_sf
    sf = np.dot(sh, B.T)
  File "<__array_function__ internals>", line 5, in dot
numpy.core._exceptions.MemoryError: Unable to allocate 18.1 GiB for an array with shape (137, 184, 133, 724) and data type float64

I'm using a repulsion724 sphere to convert to SF, which might be too much, but I'm not sure how a "lower resolution" sphere would affect the resulting ODFs and tracking. I'll ask around and test and report back.

AntoineTheb commented 1 year ago

Are you able to pull #28 @mdesco ? It probably solves your problem.