royerlab / ultrack

Cell tracking and segmentation software
https://royerlab.github.io/ultrack
BSD 3-Clause "New" or "Revised" License
64 stars 7 forks source link

Adding nodes to database takes long time #71

Open tischi opened 4 months ago

tischi commented 4 months ago

Hi,

If "adding the notes to the database" takes several minutes, did I do something wrong or could that be correct?

input image shape: (16, 63, 2, 512, 512)
cell channel image shape: (16, 63, 512, 512)
/g/cba/exchange/erk-signalling-dynamics/code/python/ultrack_tracking.py:49: RuntimeWarning: invalid value encountered in divide
  dist = dist / dist.max(axis=(1, 2, 3), keepdims=True)
computed edges
saved edges
Adding nodes to database:  62%|████████████████████████████████████████▋                        | 10/16 [07:58<10:39, 106.53s/it
tischi commented 4 months ago

In fact it threw an error now:

Linking nodes.:   0%|                                                                                     | 0/15 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/g/cba/exchange/erk-signalling-dynamics/code/python/ultrack_tracking.py", line 164, in <module>
    fire.Fire(cli)
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/g/cba/exchange/erk-signalling-dynamics/code/python/ultrack_tracking.py", line 128, in track
    link(config)
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/ultrack/core/linking/processing.py", line 230, in link
    multiprocessing_apply(
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/ultrack/utils/multiprocessing.py", line 56, in multiprocessing_apply
    return [func(t) for t in tqdm(sequence, desc=desc)]
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/ultrack/utils/multiprocessing.py", line 56, in <listcomp>
    return [func(t) for t in tqdm(sequence, desc=desc)]
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/toolz/functoolz.py", line 304, in __call__
    return self._partial(*args, **kwargs)
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/ultrack/core/linking/processing.py", line 109, in _process
    current_kdtree = KDTree(current_pos)
  File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/scipy/spatial/_kdtree.py", line 360, in __init__
    super().__init__(data, leafsize, compact_nodes, copy_data,
  File "_ckdtree.pyx", line 558, in scipy.spatial._ckdtree.cKDTree.__init__
ValueError: data must be 2 dimensions
Linking nodes.:   0%|                    
JoOkuma commented 4 months ago

Are you using remote storage like ESS, NFS, or Lustre? Because of the higher latency of remote storage, the multi-processing can get into a deadlock. I recommend reducing the number of workers.

tischi commented 4 months ago

Yes, data is on NFS. How can I reduce the number of workers?

tischi commented 4 months ago

Related, I run this on a compute node of a slurm cluster, e.g.

srun --nodes=1 --cpus-per-task=4 --mem-per-cpu=16000 --time=01:00:00 --pty /bin/bash

  1. What would you recommend I should ask for in terms of resources?
  2. How can I tell python how many workers (CPUs) it should actually use? Because my experience is that the python multi-processing does not care about what slurm actually allocates for it...
JoOkuma commented 4 months ago

Hey @tischi,

How can I reduce the number of workers?

How can I tell python how many workers (CPUs) it should actually use? Because my experience is that the python multi-processing does not care about what slurm actually allocates for it...

With the n_workers parameters from the configuration, the configuration docs are here.

What would you recommend I should ask for in terms of resources?

It depends on the size of your data and how long you can wait for the processing. When using sqlite backend (default), I don't go for more than 8. With Postgres and distributed computation, I usually scale to 100 or more nodes, but each node has a single worker (n_workers=1), but this requires more work, and it's only worth it for TB-scale datasets.