Closed annshress closed 7 months ago
BRT workflow currently fails when handling too many files ( 30 and more .mrc files of size 1G ).
slurm-log/0ae30525-93c1-4c6d-944b-eda31b6ed801/dask-worker-1122178.err
15:57:06.557 | INFO | Task run 'gen_ng_metadata-9' - Instantiating HWZarrImages /gs1/Scratch/hedwig_dev_scratch/tmp9z5qokp4/2013-1220-dA30_5-BSC-1_19_rec.zarr 15:57:06.560 | INFO | Task run 'gen_ng_metadata-9' - Accessing first HWZarrImage 15:57:06.563 | INFO | Task run 'gen_ng_metadata-9' - Creating ng metadata 15:57:06.563 | INFO | Task run 'gen_ng_metadata-9' - ... getting shader type 15:57:06.564 | INFO | Task run 'gen_ng_metadata-9' - ... getting dims 15:57:06.565 | INFO | Task run 'gen_ng_metadata-9' - ... getting shader params 15:57:06.565 | DEBUG | pytools.HedwigZarrImage - path: 0/0 15:57:06.567 | INFO | pytools.utils.histogram - ZARR array needs converting to native byteorder. ... job-extra': None, 'job-extra-directives': [], 'job-directives-skip': [], 'log-directory': None, 'scheduler-options': {}}}} 15:57:06.570 | INFO | pytools.HedwigZarrImage - Building histogram for "/gs1/Scratch/hedwig_dev_scratch/tmp9z5qokp4/2013-1220-dA30_5-BSC-1_19_rec.zarr/0"...
job never returns.
concurrency set to 1.
This issue has been avoided using enough number of cores for the workflow.
Shown image shows some of the tasks have already completed, while some are still running. However, completed tasks could pass their flow runs to next tasks rather than waiting for sister-mapped-tasks to complete