lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
902 stars 204 forks source link

PR #1332 breaks many operations #1342

Closed JinZr closed 1 month ago

JinZr commented 1 month ago

Hi, I just found that this PR breaks many operations including lhotse subset or CutSet().to_jsonl().

These are the logs I got:

>>> devset.to_jsonl("lmsys_cuts_dev.jsonl.gz")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/nvme_share/jinzr/miniconda3/envs/icl/lib/python3.10/site-packages/lhotse-1.24.0.dev0+git.bbb3fccd.clean-py3.10.egg/lhotse/serialization.py", line 300, in to_jsonl
  File "/mnt/nvme_share/jinzr/miniconda3/envs/icl/lib/python3.10/site-packages/lhotse-1.24.0.dev0+git.bbb3fccd.clean-py3.10.egg/lhotse/serialization.py", line 125, in save_to_jsonl
  File "/mnt/nvme_share/jinzr/miniconda3/envs/icl/lib/python3.10/site-packages/lhotse-1.24.0.dev0+git.bbb3fccd.clean-py3.10.egg/lhotse/cut/set.py", line 688, in <genexpr>
  File "/mnt/nvme_share/jinzr/miniconda3/envs/icl/lib/python3.10/site-packages/lhotse-1.24.0.dev0+git.bbb3fccd.clean-py3.10.egg/lhotse/cut/data.py", line 77, in to_dict
  File "/mnt/nvme_share/jinzr/miniconda3/envs/icl/lib/python3.10/site-packages/lhotse-1.24.0.dev0+git.bbb3fccd.clean-py3.10.egg/lhotse/audio/recording.py", line 339, in to_dict
  File "/mnt/nvme_share/jinzr/miniconda3/envs/icl/lib/python3.10/site-packages/lhotse-1.24.0.dev0+git.bbb3fccd.clean-py3.10.egg/lhotse/audio/recording.py", line 339, in <listcomp>
AttributeError: 'dict' object has no attribute 'to_dict'
pzelasko commented 1 month ago

I noticed subset wasn't tested with lazy manifests and I added relevant coverage. There was indeed an issue with FeatureSet, but otherwise I can't reproduce any other issue you mentioned.

https://github.com/lhotse-speech/lhotse/pull/1345