lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
902 stars 204 forks source link

[shar] cut can't load feature #1320

Closed kamirdin closed 2 months ago

kamirdin commented 2 months ago

when use shards = train_cuts.to_shar(data_dir, fields={"features": "lilcom"}, shard_size=2000, num_jobs=20) to make shar packages ;

one cut like: MonoCut(id='X0000013171_273376986_S00164_sp1.1', start=0, duration=2.4818125, channel=0, supervisions=[SupervisionSegment(id='X0000013171_273376986_S00164_sp1.1', recording_id='X0000013171_273376986_sp1.1', start=0.0, duration=2.4818125, channel=0, text='用/花/里/胡/哨/的/甜/言/蜜/语/维/持/的/婚/姻', language='Chinese', speaker=None, gender=None, custom=None, alignment=None)], features=Features(type='kaldifeat-fbank', num_frames=248, num_features=80, frame_shift=0.01, sampling_rate=16000, start=425.6818125, duration=2.4818125, storage_type='memory_lilcom', storage_path='', storage_key='', recording_id='X0000013171_273376986_sp1.1', channels=0), recording=Recording(id='X0000013171_273376986_sp1.1', sources=[AudioSource(type='file', channels=[0], source='/store52/audio_data/WenetSpeech/audio/audio/train/podcast/B00051/X0000013171_273376986.opus')], sampling_rate=16000, num_samples=10926255, duration=682.8909375, channel_ids=[0], transforms=[{'name': 'Speed', 'kwargs': {'factor': 1.1}}]), custom={'dataloading_info': {'rank': 0, 'world_size': 1, 'worker_id': None}, 'shard_origin': PosixPath('cuts.020360.jsonl.gz'), 'shar_epoch': 0})

then try to load features : cut.load_features()

has ERROR: ValueError: Cannot load features for recording X0000013171_273376986_sp1.1 starting from 0s. The available range is (425.6818125, 428.16362499999997) seconds.

pzelasko commented 2 months ago

Makes sense, there was an old TODO in the code to support exporting features with non-zero start/offset. Can you try this PR and let me know if it helped? https://github.com/lhotse-speech/lhotse/pull/1323