Open pengyizhou opened 1 day ago
You might just get lucky if you set lhotse.set_dill_enabled(True)
somewhere in your code (or LHOTSE_DILL_ENABLED=1
). Otherwise you'll have to try and debug where this generator object is created. I don't think I've ever run into this issue with Lhotse so I suspect it may be somewhere in the user code (typicall some .map or .filter method called on lhotse objects with a lambda function etc).
I used this snippet in the past to find the unpicklable objects: https://gist.github.com/pzelasko/90c1c13acd86f6c9c0aa4a3fa69dadba
You might just get lucky if you set
lhotse.set_dill_enabled(True)
somewhere in your code (orLHOTSE_DILL_ENABLED=1
). Otherwise you'll have to try and debug where this generator object is created. I don't think I've ever run into this issue with Lhotse so I suspect it may be somewhere in the user code (typicall some .map or .filter method called on lhotse objects with a lambda function etc).I used this snippet in the past to find the unpicklable objects: https://gist.github.com/pzelasko/90c1c13acd86f6c9c0aa4a3fa69dadba
Thank you very much for your reply!
I saw in the lhotse codebase that if set_dill_enabled(True) was called, it would set an env var "LHOTSE_DILL_ENABLED"=1. And also dill
package would be checked whether it was installed. I have not installed dill
package.
So I tried to print this var during the training. However, the output was None. So I believe the error was caused by some other issues.
I will try to debug further.
Hi! Recently, we have been training a large-scale dataset (> 50M audio segments) on the K2 platform. To reduce IO operations through NFS, we decided to use the Lhotse Shar format.
I followed the instructions from https://github.com/lhotse-speech/lhotse/blob/master/examples/04-lhotse-shar.ipynb It worked well when I used one GPU with one or multiple workers. However, I got an unexpected error when I trained the model using multi-GPUs.
Here is the error information:
It looks like a generator from train_dl is used in the DDP processes. Then, I tried to debug it and found the only way to make it work was by setting num_workers=0.
I am using shar format with fields of "cuts" and "features". Number of GPU=4. Has anyone had similar issues?