Closed SaileshAI closed 1 month ago
With newer MONAI versions, the API changed and they now use metatensor. Try downgrading MONAI to the version before Metatensor was introduced. Maybe this helps.
Hi @kbressem , yes thanks for this, I was able to start the training script. However, after 2 epochs, in epoch#3, I encountered the following issue -
[12:07 AM] Sanskar Khandelwal Epoch [3/500]: [24/60] 40%|████████████████████████████████████████████████████████████▊ , loss=1.8 [00:26<00:30]Current run is terminating due to exception: received 0 items of ancdata
Exception: received 0 items of ancdata
Traceback (most recent call last):
File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/ignite/engine/engine.py", line 1032, in _run_once_on_dataset_as_gen
self.state.batch = next(self._dataloader_iter)
File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1329, in _next_data
idx, data = self._get_data()
File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1295, in _get_data
success, data = self._try_get_data()
File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 495, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 189, in recv_handle
return recvfds(s, 1)[0]
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 164, in recvfds
raise RuntimeError('received %d items of ancdata' %
RuntimeError: received 0 items of ancdata
Engine run is terminating due to exception: received 0 items of ancdata
Is it another dependent package's version issue or something else ?
This means a worker died in the dataloader. This is a pytoch issue. Try to reduce the number of workers in the data loader.
This means a worker died in the dataloader. This is a pytoch issue. Try to reduce the number of workers in the data loader.
I see, and yes, configuring the num_workers to a lower number worked. Any idea on how do I infer on the trained models ? like do I infer over an image or 'nii' file ? (I am kind of new to the radiology files in form of NII). Asking this because I could not find the script to infer over a single sample of image/nii .
Stale issue message
You can add the image to the test dataset and then infer over it. This would be the most straight forward way with this library. The README shows the code at the bottom.
Stale issue message
Hi
I am trying to run the training script with the provided dataset. I am facing this below error -
run is terminating due to exception: 't2_meta_dict' [00:00<?] 2024-05-10 17:04:33,995 - ERROR - Exception: 't2_meta_dict' Traceback (most recent call last): File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/ignite/engine/engine.py", line 1069, in _run_once_on_dataset_as_gen self._fire_event(Events.ITERATION_COMPLETED) File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/ignite/engine/engine.py", line 425, in _fire_event func(first, (event_args + others), **kwargs) File "/home/cognida/Desktop/Lens/Sanskar/prostate158/prostate/lib/python3.10/site-packages/monai/handlers/metrics_saver.py", line 124, in _get_filenames meta_data = self.batch_transform(engine.state.batch) File "/home/cognida/Desktop/Lens/Sanskar/prostate158/repo/prostate158/train.py", line 385, in _get_meta_dict return [item[key] for item in batch] File "/home/cognida/Desktop/Lens/Sanskar/prostate158/repo/prostate158/train.py", line 385, in
return [item[key] for item in batch]
KeyError: 't2_meta_dict'
Engine run is terminating due to exception: 't2_meta_dict'
2024-05-10 17:04:34,027 - ERROR - Exception: 't2_meta_dict'
I have not modified any script. Any idea why it is failing ?