Closed desh2608 closed 2 years ago
Is it possible your job failed during extraction and left the file corrupted? Can you retry and see?
Ok, I'll try again.
Did it help?
Sorry I didn't get time to get back to this, but it must have been a file corruption as you mention. You can close this issue for now. If I run into the error again, I'll reopen it.
I also meet this issue.
(k2-python) luomingshuang@de-74279-k2-train-2-0602201035-5fb6d86964-mclm7:~/codes/icefall-pruned-rnnt5-aishell4/egs/aishell4/ASR$ CUDA_VISIBLE_DEVICES='4' python pruned_transducer_stateless5/train.py --max-duration 200
2022-06-07 15:47:16,465 INFO [train.py:880] Training started
2022-06-07 15:47:16.660707: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /ceph-sh1/fangjun/software/cuda-10.2.89/lib:/ceph-sh1/fangjun/software/cuda-10.2.89/lib64:
2022-06-07 15:47:16.660748: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-06-07 15:47:18,838 INFO [train.py:890] Device: cuda:0
2022-06-07 15:47:18,901 INFO [lexicon.py:176] Loading pre-compiled data/lang_char/Linv.pt
2022-06-07 15:47:18,911 INFO [train.py:901] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'model_warm_step': 3000, 'env_info': {'k2-version': '1.15.1', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'f8d2dba06c000ffee36aab5b66f24e7c9809f116', 'k2-git-date': 'Thu Apr 21 12:20:34 2022', 'lhotse-version': '1.3.0.dev+git.5dbc5fb.dirty', 'torch-version': '1.11.0', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'icefall-pruned-rnnt5-aishell4', 'icefall-git-sha1': 'b4b3a84-dirty', 'icefall-git-date': 'Tue Jun 7 12:20:12 2022', 'icefall-path': '/ceph-meixu/luomingshuang/icefall', 'k2-path': '/ceph-ms/luomingshuang/k2_latest/k2/python/k2/__init__.py', 'lhotse-path': '/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-0602201035-5fb6d86964-mclm7', 'IP address': '10.177.74.202'}, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless5/exp'), 'lang_dir': 'data/lang_char', 'initial_lr': 0.003, 'lr_batches': 5000, 'lr_epochs': 6, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 100, 'use_fp16': False, 'num_encoder_layers': 24, 'dim_feedforward': 1536, 'nhead': 8, 'encoder_dim': 384, 'decoder_dim': 512, 'joiner_dim': 512, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'training_subset': 'L', 'blank_id': 0, 'vocab_size': 3284}
2022-06-07 15:47:18,912 INFO [train.py:903] About to create model
2022-06-07 15:47:19,386 INFO [train.py:907] Number of model parameters: 94337552
2022-06-07 15:47:22,629 INFO [asr_datamodule.py:429] About to get train cuts
2022-06-07 15:47:22,631 INFO [asr_datamodule.py:231] About to get Musan cuts
2022-06-07 15:47:22,632 INFO [asr_datamodule.py:238] Enable MUSAN
2022-06-07 15:47:22,725 INFO [asr_datamodule.py:263] Enable SpecAugment
2022-06-07 15:47:22,726 INFO [asr_datamodule.py:264] Time warp factor: 80
2022-06-07 15:47:22,726 INFO [asr_datamodule.py:276] Num frame mask: 10
2022-06-07 15:47:22,726 INFO [asr_datamodule.py:289] About to create train dataset
2022-06-07 15:47:22,726 INFO [asr_datamodule.py:318] Using DynamicBucketingSampler.
2022-06-07 15:47:26,453 INFO [asr_datamodule.py:334] About to create train dataloader
2022-06-07 15:47:26,454 INFO [asr_datamodule.py:437] About to get dev cuts
2022-06-07 15:47:26,456 INFO [asr_datamodule.py:365] About to create dev dataset
2022-06-07 15:47:27,138 INFO [asr_datamodule.py:384] About to create dev dataloader
2022-06-07 15:47:27,139 INFO [train.py:1056] Sanity check -- see if any of the batches in epoch 1 would cause OOM.
2022-06-07 15:51:56,549 INFO [train.py:818] Epoch 1, batch 0, loss[loss=1.001, simple_loss=2.002, pruned_loss=9.132, over 4887.00 frames.], tot_loss[loss=1.001, simple_loss=2.002, pruned_loss=9.132, over 4887.00 frames.], batch size: 21, lr: 3.00e-03
Traceback (most recent call last):
File "pruned_transducer_stateless5/train.py", line 1108, in <module>
main()
File "pruned_transducer_stateless5/train.py", line 1101, in main
run(rank=0, world_size=1, args=args)
File "pruned_transducer_stateless5/train.py", line 1010, in run
train_one_epoch(
File "pruned_transducer_stateless5/train.py", line 750, in train_one_epoch
for batch_idx, batch in enumerate(train_dl):
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1204, in _next_data
return self._process_data(data)
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/utils.py", line 668, in wrapper
return fn(*args, **kwargs)
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/cut.py", line 1012, in load_features
feats = self.features.load(start=self.start, duration=self.duration)
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/features/base.py", line 476, in load
return storage.read(
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/caching.py", line 70, in wrapper
return m(*args, **kwargs)
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/features/io.py", line 771, in read
decompressed_chunks = [lilcom.decompress(data) for data in chunk_data]
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/features/io.py", line 771, in <listcomp>
decompressed_chunks = [lilcom.decompress(data) for data in chunk_data]
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lilcom-1.1.1-py3.8-linux-x86_64.egg/lilcom/lilcom_interface.py", line 110, in decompress
raise ValueError("Something went wrong in decompression (likely bad data): "
ValueError: Something went wrong in decompression (likely bad data): decompress_float returned 7
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/utils.py", line 668, in wrapper
return fn(*args, **kwargs)
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/cut.py", line 2872, in load_features
base_feats=first_cut.load_features(),
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/utils.py", line 670, in wrapper
raise type(e)(
ValueError: Something went wrong in decompression (likely bad data): decompress_float returned 7
[extra info] When calling: MonoCut.load_features(args=(MonoCut(id='69666566-1e79-46b5-ae86-c73d26eda59d', start=1939.8653125, duration=3.535, channel=0, supervisions=[SupervisionSegment(id='20200707_L_R001S07C01-SPK0177-143', recording_id='20200707_L_R001S07C01', start=0.0, duration=3.535, channel=0, text='对这也可以在咱们设计宣传页中体现出', language='Chinese', speaker='SPK0177', gender=None, custom=None, alignment=None)], features=Features(type='kaldi-fbank', num_frames=223855, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=2238.554, storage_type='lilcom_chunky', storage_path='data/fbank/aishell4_feats_train_L/feats-5.lca', storage_key='797975384,46595,46375,46457,46408,46563,46683,46719,46616,45869,44638,45440,45598,45592,44962,45612,46175,44952,44938,44666,45372,45229,44912,44773,45222,44842,44967,45484,45312,45762,45720,46269,45907,46186,46074,46283,46580,45918,44998,45119,44809,44815,44615,44822,44484,44668,44980,45307,45602,45636,45669,45597,45419,45595,45568,45688,45555,46001,45410,45187,44985,46054,45766,44680,45160,45437,45731,46178,45777,46002,45430,45774,45508,45607,45505,45738,44946,44732,45281,45890,46158,45194,45484,45316,45576,44806,45299,45430,45505,45932,45477,44793,45140,44698,45305,45468,46065,45868,45849,45683,45864,45407,45634,46099,45804,46225,45586,45589,45845,45658,45585,44877,45860,45552,44743,45230,45123,46095,45979,46122,45948,45807,45357,45773,45241,44594,44507,45265,45872,45457,45102,45390,45317,45873,45779,45660,45487,44590,45481,46056,45564,45164,45000,45162,44766,44628,45742,45932,45514,45767,45822,45545,44907,44240,45270,45092,45141,45922,44126,45229,45343,44974,44018,44664,45718,44832,45345,44689,45270,45688,45906,44212,45336,45651,45752,44795,45129,44917,45324,45434,45298,44926,45122,44794,45252,45854,45871,44660,45290,44580,46171,44381,44333,44655,44384,45193,44942,44984,45365,45147,45831,45381,45753,44592,44494,45098,45170,45721,44971,44686,44869,45455,45011,46233,44870,45506,45455,45066,45717,44937,45364,45233,45239,45215,44724,44604,45446,44499,45024,44799,44636,44436,44524,44373,45059,44609,45225,44817,44926,44324,44811,44160,44394,45847,45564,45961,45409,45629,45762,45759,45483,45613,45045,44888,45412,45932,45129,46027,45235,44754,43787,44224,45244,45222,44772,45550,45506,45002,44973,44726,44237,44234,44513,44593,44835,44201,44281,43984,44225,44515,44884,44253,44481,43522,44864,45413,45525,45401,45233,45005,45100,44748,44313,44330,44643,44545,44631,44501,44515,43939,44776,45289,44528,44402,44203,44072,44230,44891,44757,43897,43421,43903,43765,44196,44277,44728,45135,45103,45543,45341,44757,44321,44378,43670,44565,44375,44968,45008,44743,44379,45684,45651,45533,44468,44779,44644,45846,44885,45056,45142,44884,44252,44656,45742,44937,45197,44820,45036,44552,44769,44576,44603,45867,45366,45072,45020,44681,44796,44957,44571,44825,44523,44798,45431,45534,45005,45764,44081,43906,43892,43376,44185,44806,44602,44452,44907,45514,45405,44902,45389,45084,44749,44436,45262,44642,45378,45248,45310,44934,45465,45322,45478,45482,45449,45105,45319,44841,44184,44618,44921,44671,45026,44705,44812,45221,45197,44454,44560,44411,45337,44555,45846,44964,44370,45302,45229,45607,44252,45233,44530,45164,44898,44741,44868,45233,45783,45827,45681,45903,45427,45427,45102,45116,45119,45126,44763,45568,45445,45235,44944,45834,45269,44512,45209,45243,45052,44781,44552,32881', recording_id='None', channels=0), recording=Recording(id='20200707_L_R001S07C01', sources=[AudioSource(type='file', channels=[0, 1, 2, 3, 4, 5, 6, 7], source='/ceph-ms/luomingshuang/codes/icefall-pruned-rnnt5-aishell4/egs/aishell4/ASR/download/aishell4/train_L/wav/20200707_L_R001S07C01.flac')], sampling_rate=16000, num_samples=35816864, duration=2238.554, transforms=None), custom=None),) kwargs={})
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = self.dataset[possibly_batched_index]
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/dataset/speech_recognition.py", line 113, in __getitem__
input_tpl = self.input_strategy(cuts)
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/dataset/input_strategies.py", line 120, in __call__
return collate_features(
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/dataset/collation.py", line 138, in collate_features
features[idx] = _read_features(cut)
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/dataset/collation.py", line 514, in _read_features
return torch.from_numpy(cut.load_features())
File "/ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/lhotse-1.3.0.dev0+git.5dbc5fb.dirty-py3.8.egg/lhotse/utils.py", line 670, in wrapper
raise type(e)(
ValueError: Something went wrong in decompression (likely bad data): decompress_float returned 7
[extra info] When calling: MonoCut.load_features(args=(MonoCut(id='69666566-1e79-46b5-ae86-c73d26eda59d', start=1939.8653125, duration=3.535, channel=0, supervisions=[SupervisionSegment(id='20200707_L_R001S07C01-SPK0177-143', recording_id='20200707_L_R001S07C01', start=0.0, duration=3.535, channel=0, text='对这也可以在咱们设计宣传页中体现出', language='Chinese', speaker='SPK0177', gender=None, custom=None, alignment=None)], features=Features(type='kaldi-fbank', num_frames=223855, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=2238.554, storage_type='lilcom_chunky', storage_path='data/fbank/aishell4_feats_train_L/feats-5.lca', storage_key='797975384,46595,46375,46457,46408,46563,46683,46719,46616,45869,44638,45440,45598,45592,44962,45612,46175,44952,44938,44666,45372,45229,44912,44773,45222,44842,44967,45484,45312,45762,45720,46269,45907,46186,46074,46283,46580,45918,44998,45119,44809,44815,44615,44822,44484,44668,44980,45307,45602,45636,45669,45597,45419,45595,45568,45688,45555,46001,45410,45187,44985,46054,45766,44680,45160,45437,45731,46178,45777,46002,45430,45774,45508,45607,45505,45738,44946,44732,45281,45890,46158,45194,45484,45316,45576,44806,45299,45430,45505,45932,45477,44793,45140,44698,45305,45468,46065,45868,45849,45683,45864,45407,45634,46099,45804,46225,45586,45589,45845,45658,45585,44877,45860,45552,44743,45230,45123,46095,45979,46122,45948,45807,45357,45773,45241,44594,44507,45265,45872,45457,45102,45390,45317,45873,45779,45660,45487,44590,45481,46056,45564,45164,45000,45162,44766,44628,45742,45932,45514,45767,45822,45545,44907,44240,45270,45092,45141,45922,44126,45229,45343,44974,44018,44664,45718,44832,45345,44689,45270,45688,45906,44212,45336,45651,45752,44795,45129,44917,45324,45434,45298,44926,45122,44794,45252,45854,45871,44660,45290,44580,46171,44381,44333,44655,44384,45193,44942,44984,45365,45147,45831,45381,45753,44592,44494,45098,45170,45721,44971,44686,44869,45455,45011,46233,44870,45506,45455,45066,45717,44937,45364,45233,45239,45215,44724,44604,45446,44499,45024,44799,44636,44436,44524,44373,45059,44609,45225,44817,44926,44324,44811,44160,44394,45847,45564,45961,45409,45629,45762,45759,45483,45613,45045,44888,45412,45932,45129,46027,45235,44754,43787,44224,45244,45222,44772,45550,45506,45002,44973,44726,44237,44234,44513,44593,44835,44201,44281,43984,44225,44515,44884,44253,44481,43522,44864,45413,45525,45401,45233,45005,45100,44748,44313,44330,44643,44545,44631,44501,44515,43939,44776,45289,44528,44402,44203,44072,44230,44891,44757,43897,43421,43903,43765,44196,44277,44728,45135,45103,45543,45341,44757,44321,44378,43670,44565,44375,44968,45008,44743,44379,45684,45651,45533,44468,44779,44644,45846,44885,45056,45142,44884,44252,44656,45742,44937,45197,44820,45036,44552,44769,44576,44603,45867,45366,45072,45020,44681,44796,44957,44571,44825,44523,44798,45431,45534,45005,45764,44081,43906,43892,43376,44185,44806,44602,44452,44907,45514,45405,44902,45389,45084,44749,44436,45262,44642,45378,45248,45310,44934,45465,45322,45478,45482,45449,45105,45319,44841,44184,44618,44921,44671,45026,44705,44812,45221,45197,44454,44560,44411,45337,44555,45846,44964,44370,45302,45229,45607,44252,45233,44530,45164,44898,44741,44868,45233,45783,45827,45681,45903,45427,45427,45102,45116,45119,45126,44763,45568,45445,45235,44944,45834,45269,44512,45209,45243,45052,44781,44552,32881', recording_id='None', channels=0), recording=Recording(id='20200707_L_R001S07C01', sources=[AudioSource(type='file', channels=[0, 1, 2, 3, 4, 5, 6, 7], source='/ceph-ms/luomingshuang/codes/icefall-pruned-rnnt5-aishell4/egs/aishell4/ASR/download/aishell4/train_L/wav/20200707_L_R001S07C01.flac')], sampling_rate=16000, num_samples=35816864, duration=2238.554, transforms=None), custom=None),) kwargs={})
[extra info] When calling: MixedCut.load_features(args=(MixedCut(id='bb49ef5e-27c2-cb69-f539-a085aaf9755b', tracks=[MixTrack(cut=MonoCut(id='69666566-1e79-46b5-ae86-c73d26eda59d', start=1939.8653125, duration=3.535, channel=0, supervisions=[SupervisionSegment(id='20200707_L_R001S07C01-SPK0177-143', recording_id='20200707_L_R001S07C01', start=0.0, duration=3.535, channel=0, text='对这也可以在咱们设计宣传页中体现出', language='Chinese', speaker='SPK0177', gender=None, custom=None, alignment=None)], features=Features(type='kaldi-fbank', num_frames=223855, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=2238.554, storage_type='lilcom_chunky', storage_path='data/fbank/aishell4_feats_train_L/feats-5.lca', storage_key='797975384,46595,46375,46457,46408,46563,46683,46719,46616,45869,44638,45440,45598,45592,44962,45612,46175,44952,44938,44666,45372,45229,44912,44773,45222,44842,44967,45484,45312,45762,45720,46269,45907,46186,46074,46283,46580,45918,44998,45119,44809,44815,44615,44822,44484,44668,44980,45307,45602,45636,45669,45597,45419,45595,45568,45688,45555,46001,45410,45187,44985,46054,45766,44680,45160,45437,45731,46178,45777,46002,45430,45774,45508,45607,45505,45738,44946,44732,45281,45890,46158,45194,45484,45316,45576,44806,45299,45430,45505,45932,45477,44793,45140,44698,45305,45468,46065,45868,45849,45683,45864,45407,45634,46099,45804,46225,45586,45589,45845,45658,45585,44877,45860,45552,44743,45230,45123,46095,45979,46122,45948,45807,45357,45773,45241,44594,44507,45265,45872,45457,45102,45390,45317,45873,45779,45660,45487,44590,45481,46056,45564,45164,45000,45162,44766,44628,45742,45932,45514,45767,45822,45545,44907,44240,45270,45092,45141,45922,44126,45229,45343,44974,44018,44664,45718,44832,45345,44689,45270,45688,45906,44212,45336,45651,45752,44795,45129,44917,45324,45434,45298,44926,45122,44794,45252,45854,45871,44660,45290,44580,46171,44381,44333,44655,44384,45193,44942,44984,45365,45147,45831,45381,45753,44592,44494,45098,45170,45721,44971,44686,44869,45455,45011,46233,44870,45506,45455,45066,45717,44937,45364,45233,45239,45215,44724,44604,45446,44499,45024,44799,44636,44436,44524,44373,45059,44609,45225,44817,44926,44324,44811,44160,44394,45847,45564,45961,45409,45629,45762,45759,45483,45613,45045,44888,45412,45932,45129,46027,45235,44754,43787,44224,45244,45222,44772,45550,45506,45002,44973,44726,44237,44234,44513,44593,44835,44201,44281,43984,44225,44515,44884,44253,44481,43522,44864,45413,45525,45401,45233,45005,45100,44748,44313,44330,44643,44545,44631,44501,44515,43939,44776,45289,44528,44402,44203,44072,44230,44891,44757,43897,43421,43903,43765,44196,44277,44728,45135,45103,45543,45341,44757,44321,44378,43670,44565,44375,44968,45008,44743,44379,45684,45651,45533,44468,44779,44644,45846,44885,45056,45142,44884,44252,44656,45742,44937,45197,44820,45036,44552,44769,44576,44603,45867,45366,45072,45020,44681,44796,44957,44571,44825,44523,44798,45431,45534,45005,45764,44081,43906,43892,43376,44185,44806,44602,44452,44907,45514,45405,44902,45389,45084,44749,44436,45262,44642,45378,45248,45310,44934,45465,45322,45478,45482,45449,45105,45319,44841,44184,44618,44921,44671,45026,44705,44812,45221,45197,44454,44560,44411,45337,44555,45846,44964,44370,45302,45229,45607,44252,45233,44530,45164,44898,44741,44868,45233,45783,45827,45681,45903,45427,45427,45102,45116,45119,45126,44763,45568,45445,45235,44944,45834,45269,44512,45209,45243,45052,44781,44552,32881', recording_id='None', channels=0), recording=Recording(id='20200707_L_R001S07C01', sources=[AudioSource(type='file', channels=[0, 1, 2, 3, 4, 5, 6, 7], source='/ceph-ms/luomingshuang/codes/icefall-pruned-rnnt5-aishell4/egs/aishell4/ASR/download/aishell4/train_L/wav/20200707_L_R001S07C01.flac')], sampling_rate=16000, num_samples=35816864, duration=2238.554, transforms=None), custom=None), offset=0.0, snr=None), MixTrack(cut=MonoCut(id='012955da-b07f-44d9-8cac-c4ce7cbf2a18', start=230.0, duration=3.57, channel=0, supervisions=[], features=Features(type='kaldi-fbank', num_frames=1000, num_features=80, frame_shift=0.01, sampling_rate=16000, start=230.0, duration=10.0, storage_type='lilcom_chunky', storage_path='data/fbank/musan_feats/feats-10.lca', storage_key='217603661,43072,44618', recording_id='None', channels=0), recording=Recording(id='speech-us-gov-0085', sources=[AudioSource(type='file', channels=[0], source='/ceph-ms/luomingshuang/codes/icefall-pruned-rnnt5-aishell4/egs/aishell4/ASR/download/musan/speech/us-gov/speech-us-gov-0085.wav')], sampling_rate=16000, num_samples=9599687, duration=599.9804375, transforms=None), custom=None), offset=0.0, snr=12.635423209346753), MixTrack(cut=PaddingCut(id='6aa0246b-72e3-7714-a912-aee6c560e6e4', duration=0.0, sampling_rate=16000, feat_value=-23.025850929940457, num_frames=0, num_features=80, frame_shift=0.01, num_samples=0, custom=None), offset=3.57, snr=None)]),) kwargs={})
I confirm that the last two lines from the above logs can be executed successfully from an interactive terminal.
@luomingshuang if you want just a quick fix for this, I think setting num_workers=0 in your asr_datamodule.py works. It's some kind of threading bug.
Sorry I didn't get time to get back to this,
@desh2608
Do you keep using LilcomChunkyWriter and the issue disappears or do you just switch to LilcomChunk LilcomHdf5Writer and then the issue disappear?
I have suggested @luomingshuang to use the changes in https://github.com/k2-fsa/icefall/discussions/391#discussioncomment-2885803
That is,
diff --git a/lhotse/features/io.py b/lhotse/features/io.py
index 8ddeed1..b139ae7 100644
--- a/lhotse/features/io.py
+++ b/lhotse/features/io.py
@@ -380,7 +380,8 @@ def lookup_cache_or_open_regular_file(storage_path: str):
The file handles can be freed at any time by calling ``close_cached_file_handles()``.
"""
f = open(storage_path, "rb")
- return f
+ lock = threading.Lock()
+ return f, lock
@lru_cache(maxsize=None)
@@ -737,8 +738,7 @@ class LilcomChunkyReader(FeaturesReader):
def __init__(self, storage_path: Pathlike, *args, **kwargs):
super().__init__()
- self.file = lookup_cache_or_open_regular_file(storage_path)
- self.lock = threading.Lock()
+ self.file, self.lock = lookup_cache_or_open_regular_file(storage_path)
@dynamic_lru_cache
def read(
It still does not help.
I will use ChunkedLilcomHdf5Writer to compute fbank feature and use it to test.
Hmm I think the problem only appears when working with long-recording data, I was trying to repro on mini librispeech so that’s probably why it didn’t work. Let me take another look at it then.
Sorry I didn't get time to get back to this,
@desh2608
Do you keep using LilcomChunkyWriter and the issue disappears or do you just switch to LilcomChunk LilcomHdf5Writer and then the issue disappear?
I think I just extracted features again (still using LilcomChunkyWriter) and could get through successfully, so perhaps it was a corruption issue as Piotr had mentioned earlier.
I don't think so. We had the problem here and verified that the features were not corrupted. I think it's some kind of threading bug but the fix we tried didn't work; we must have overlooked something.
I used
kaldifeat
to extract some features and stored them using the default storage type, which isLilcomChunkyWriter
, but it seemed to be throwing some errors at the time of data loading:When I switched to using
LilcomHdf5Writer
, the data loading was successful.