hi, i tried to train with my own data but i got the following error when executing python train.py -p path_to_your_pretrained_model as following
'torchaudio' installed and imported.
Seed set to 114514
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------------------------------------
0 | backbone | UNetBackbone | 101 M
1 | head | Linear | 5.5 K
2 | ph_frame_GHM_loss_fn | GHMLoss | 0
3 | pseudo_label_GHM_loss_fn | MultiLabelGHMLoss | 0
4 | ph_edge_GHM_loss_fn | MultiLabelGHMLoss | 0
5 | EMD_loss_fn | BinaryEMDLoss | 0
6 | ph_edge_diff_GHM_loss_fn | MultiLabelGHMLoss | 0
7 | MSE_loss_fn | MSELoss | 0
8 | CTC_GHM_loss_fn | CTCGHMLoss | 0
---------------------------------------------------------------
101 M Trainable params
0 Non-trainable params
101 M Total params
406.018 Total estimated model params size (MB)
Sanity Checking: | | 0/? [00:00<?, ?it/s]/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
Traceback (most recent call last):
File "/home/ria/SOFA/train.py", line 152, in <module>
main()
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/train.py", line 138, in main
trainer.fit(
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
results = self._run_stage()
^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1031, in _run_stage
self._run_sanity_check()
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1060, in _run_sanity_check
val_loop.run()
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/loops/utilities.py", line 182, in _decorator
return loop_run(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 128, in run
batch, batch_idx, dataloader_idx = next(data_fetcher)
^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/loops/fetchers.py", line 133, in __next__
batch = super().__next__()
^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/loops/fetchers.py", line 60, in __next__
batch = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/utilities/combined_loader.py", line 341, in __next__
out = next(self._iterator)
^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/lightning/pytorch/utilities/combined_loader.py", line 142, in __next__
out = next(self.iterators[0])
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 629, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 672, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/home/ria/SOFA/dataset.py", line 84, in __getitem__
ph_seq = np.array(item["ph_seq"])
~~~~^^^^^^^^^^
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/ria/SOFA/venv/lib/python3.11/site-packages/h5py/_hl/group.py", line 357, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 241, in h5py.h5o.open
KeyError: "Unable to synchronously open object (object 'ph_seq' doesn't exist)"
before this command, i ran python binarize.py which was no problem
Data compression ratio:
full label data: 45.89 %,
weak label data: 54.11 %,
no label data: 0.00 %.
Successfully binarized train set, total time 3756.69s, saved to data/binary/train.h5py
for additional infomations, i ran it in wsl2 debian12, cuda12.4 nightly, python3.11.2, i don't know if it affect the result.
SOFA was developed and tested in a Python 3.8 environment. I am not sure if Python 3.11 can run SOFA properly. Perhaps you need to use conda to create a Python 3.8 virtual environment and then try again.
hi, i tried to train with my own data but i got the following error when executing
python train.py -p path_to_your_pretrained_model
as followingbefore this command, i ran
python binarize.py
which was no problemfor additional infomations, i ran it in wsl2 debian12, cuda12.4 nightly, python3.11.2, i don't know if it affect the result.