lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
940 stars 215 forks source link

Read seperate .jsonl.gz from fbank filter them and make a Cutset into single variable. #1371

Open sanjuktasr opened 2 months ago

sanjuktasr commented 2 months ago

code : for items in mf2: test_item_cuts = librispeech.test_clean_cuts(items)

       # test_filtered_cuts = test_item_cuts.filter(remove_short_and_long_utt)

        if test_item_cuts is not None:
            print(items)
            test_filtered_cuts = test_item_cuts.filter(remove_short_and_long_utt)
            if count==0:
                #test_other_cuts = librispeech.test_other_cuts()
                test_clean_cuts = test_filtered_cuts
            else:
                test_clean_cuts = test_clean_cuts + test_filtered_cuts
            count=count+1
    print(count)

    test_clean_cuts.to_file("manifest.jsonl.gz")

Error: File "/NAS1/sanjukta_repo_falcon2/zipformer_gitlab_local/zipformer_v1/icefall/egs/librispeech/ASR/./zipformer/decode.py", line 1224, in main() File "/usr/local/lib/python3.9/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/NAS1/sanjukta_repo_falcon2/zipformer_gitlab_local/zipformer_v1/icefall/egs/librispeech/ASR/./zipformer/decode.py", line 1162, in main test_clean_cuts.to_file("manifest.jsonl.gz") File "/workspace/lhotse/lhotse/serialization.py", line 579, in to_file store_manifest(self, path) File "/workspace/lhotse/lhotse/serialization.py", line 564, in store_manifest manifest.to_jsonl(path) File "/workspace/lhotse/lhotse/serialization.py", line 346, in to_jsonl save_to_jsonl(self.to_dicts(), path) File "/workspace/lhotse/lhotse/serialization.py", line 171, in save_to_jsonl for item in data: File "/workspace/lhotse/lhotse/cut/set.py", line 688, in return (cut.to_dict() for cut in self) File "/workspace/lhotse/lhotse/cut/data.py", line 77, in to_dict d["recording"] = self.recording.to_dict() File "/workspace/lhotse/lhotse/audio/recording.py", line 339, in to_dict d["transforms"] = [t.to_dict() for t in self.transforms] File "/workspace/lhotse/lhotse/audio/recording.py", line 339, in d["transforms"] = [t.to_dict() for t in self.transforms] AttributeError: 'dict' object has no attribute 'to_dict'

pzelasko commented 2 months ago

Update to the latest lhotse version (ideally from the master branch)

sanjuktasr commented 2 months ago

Can you please elaborate how will updating to the latest version help in achieving the forming a single cutset(<class 'lhotse.lazy.LazyManifestIterator'>) from a list of cutset(<class 'lhotse.lazy.LazyManifestIterator'>)?

pzelasko commented 2 months ago

We had broken transform serialization after some refactoring that I fixed fairly recently (I think 1.24.2 has those fixes already).

d["transforms"] = [t.to_dict() for t in self.transforms] AttributeError: 'dict' object has no attribute 'to_dict'