k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
898 stars 286 forks source link

Lhotse processing error in librispeech recipe #918

Open ngoel17 opened 1 year ago

ngoel17 commented 1 year ago

We noticed this error while running the icefall training on some other dataset. Did a fresh install and ran librispeech recipe and replicated the same error that seems to be triggering from Lhotse handling the data. LOG file is attached. libri.log

pzelasko commented 1 year ago

These lines are the key:

  File "/mnt/dsk1/22feb/lhotse/lhotse/features/io.py", line 765, in <listcomp>
    decompressed_chunks = [lilcom.decompress(data) for data in chunk_data]
  File "~/anaconda3/envs/k2_feb23/lib/python3.9/site-packages/lilcom/lilcom_interface.py", line 110, in decompress
    raise ValueError("Something went wrong in decompression (likely bad data): "
ValueError: Something went wrong in decompression (likely bad data): decompress_float returned 7

I think you may have corrupted data, did all the feature extraction jobs / scripts complete successfully?

ngoel17 commented 1 year ago

Yes. Feature extraction scripts ran completely and did not throw any errors. However, we get exactly the same messages on two other datasets also.

On Tue, Feb 21, 2023 at 2:10 PM Piotr Żelasko @.***> wrote:

These lines are the key:

File "/mnt/dsk1/22feb/lhotse/lhotse/features/io.py", line 765, in decompressed_chunks = [lilcom.decompress(data) for data in chunk_data] File "~/anaconda3/envs/k2_feb23/lib/python3.9/site-packages/lilcom/lilcom_interface.py", line 110, in decompress raise ValueError("Something went wrong in decompression (likely bad data): " ValueError: Something went wrong in decompression (likely bad data): decompress_float returned 7

I think you may have corrupted data, did all the feature extraction jobs / scripts complete successfully?

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/icefall/issues/918#issuecomment-1438968368, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDHE6A577VZA5TXK7VN7S3WYUHKNANCNFSM6AAAAAAVDKLFCE . You are receiving this because you authored the thread.Message ID: @.***>

pzelasko commented 1 year ago

Hmmm, I am not sure what happened then. Here's a few long shots, maybe one of them would work:

ngoel17 commented 1 year ago

Yeah. do you have a preference for a decompression method? There is also this environment variable regarding protobuf that helps some people but probably hurt us.

I will try to see if I can find more pointers on the three suggestions. As far as I know, no updates were done on one system but not another. At the moment we are not 100% sure if the problem is bad data at the time of feature extraction or a load problem, and if its really in the data or the code.

csukuangfj commented 1 year ago

By the way, did you restart the feature extraction at some point because of some error?

s-mousmita commented 1 year ago

By the way, did you restart the feature extraction at some point because of some error?

We didnt. We ran the librispeech/ASR/prepare.sh without any modification and it did all stages in one go.