k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
867 stars 282 forks source link

lilcom error occurs when using kespeech training: lilcom: Length of string was too short #1688

Open kellkwang opened 1 month ago

kellkwang commented 1 month ago

ValueError lilcom: Length of string was too short [extra info] When calling: MonoCut.load_features(args=(MonoCut(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1', start=0.0, duration=6.0533125, channel=0, supervisions=[SupervisionSegment(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', recording_id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', start=0.0, duration=6.0533125, channel=0, text='由市政府分管领导担任负责人', language='Chinese', speaker='KeSpeech_KeSpeech_000000352', gender=None, custom={'origin': 'aidatatang_200zh'}, alignment=None)], features=Features(type='kaldi-fbank', num_frames=605, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0.0, duration=6.0533125, storage_type='lilcom_chunky', storage_path='data/fbank/kespeech_feats_train/feats-55.lca', storage_key='2370543709,43556,8961', recording_id='None', channels=0), recording=Recording(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', sources=[AudioSource(type='file', channels=[0], source='KeSpeech/KeSpeech_000000352/1011480_9a450cb0.wav')], sampling_rate=16000, num_samples=96853, duration=6.0533125, channel_ids=[0], transforms=[{'name': 'Speed', 'kwargs': {'factor': 1.1}}]), custom={'dataloading_info': {'rank': 6, 'world_size': 8, 'worker_id': None}}),) kwargs={}) [extra info] When calling: MixedCut.load_features(args=(MixedCut(id='44b5ea5b-1e26-427a-9ee7-4721603c8386', tracks=[MixTrack(cut=MonoCut(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1', start=0.0, duration=6.0533125, channel=0, supervisions=[SupervisionSegment(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', recording_id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', start=0.0, duration=6.0533125, channel=0, text='由市政府分管领导担任负责人', language='Chinese', speaker='KeSpeech_KeSpeech_000000352', gender=None, custom={'origin': 'aidatatang_200zh'}, alignment=None)], features=Features(type='kaldi-fbank', num_frames=605, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0.0, duration=6.0533125, storage_type='lilcom_chunky', storage_path='data/fbank/kespeech_feats_train/feats-55.lca', storage_key='2370543709,43556,8961', recording_id='None', channels=0), recording=Recording(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', sources=[AudioSource(type='file', channels=[0], source='KeSpeech/KeSpeech_000000352/1011480_9a450cb0.wav')], sampling_rate=16000, num_samples=96853, duration=6.0533125, channel_ids=[0], transforms=[{'name': 'Speed', 'kwargs': {'factor': 1.1}}]), custom={'dataloading_info': {'rank': 6, 'world_size': 8, 'worker_id': None}}), type='MonoCut', offset=0.0, snr=None), MixTrack(cut=PaddingCut(id='846e29d3-a509-43d1-a114-4d22d52d4be7', duration=0.1866875, sampling_rate=16000, feat_value=-23.025850929940457, num_frames=19, num_features=80, frame_shift=0.01, num_samples=2987, video=None, custom=None), type='PaddingCut', offset=6.0533125, snr=None)], transforms=None),) kwargs={})

I checked the data and there is nothing wrong with it. How to solve this error?

csukuangfj commented 1 month ago
MonoCut.load_features(args=(MonoCut(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1', start=0.0, duration=6.0533125, channel=0, supervisions=[SupervisionSegment(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', recording_id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', start=0.0, duration=6.0533125, channel=0, text='由市政府分管领导担任负责人', language='Chinese', speaker='KeSpeech_KeSpeech_000000352', gender=None, custom={'origin': 'aidatatang_200zh'}, alignment=None)], features=Features(type='kaldi-fbank', num_frames=605, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0.0, duration=6.0533125, storage_type='lilcom_chunky', storage_path='data/fbank/kespeech_feats_train/feats-55.lca', storage_key='2370543709,43556,8961', recording_id='None', channels=0), recording=Recording(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', sources=[AudioSource(type='file', channels=[0], source='KeSpeech/KeSpeech_000000352/1011480_9a450cb0.wav')], sampling_rate=16000, num_samples=96853, duration=6.0533125, channel_ids=[0], transforms=[{'name': 'Speed', 'kwargs': {'factor': 1.1}}]), custom={'dataloading_info': {'rank': 6, 'world_size': 8, 'worker_id': None}}),) kwargs={})

Does it work when you run this statement in a separate python script?

Note you need to import related python packages

kellkwang commented 1 month ago

MonoCut

MonoCut.load_features(args=(MonoCut(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1', start=0.0, duration=6.0533125, channel=0, supervisions=[SupervisionSegment(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', recording_id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', start=0.0, duration=6.0533125, channel=0, text='由市政府分管领导担任负责人', language='Chinese', speaker='KeSpeech_KeSpeech_000000352', gender=None, custom={'origin': 'aidatatang_200zh'}, alignment=None)], features=Features(type='kaldi-fbank', num_frames=605, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0.0, duration=6.0533125, storage_type='lilcom_chunky', storage_path='data/fbank/kespeech_feats_train/feats-55.lca', storage_key='2370543709,43556,8961', recording_id='None', channels=0), recording=Recording(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', sources=[AudioSource(type='file', channels=[0], source='KeSpeech/KeSpeech_000000352/1011480_9a450cb0.wav')], sampling_rate=16000, num_samples=96853, duration=6.0533125, channel_ids=[0], transforms=[{'name': 'Speed', 'kwargs': {'factor': 1.1}}]), custom={'dataloading_info': {'rank': 6, 'world_size': 8, 'worker_id': None}}),) kwargs={})

Does it work when you run this statement in a separate python script?

Note you need to import related python packages

Running this directly, I get a SyntaxError: invalid syntax error even though I import

from lhotse.cut import MonoCut

csukuangfj commented 1 month ago

Could you post a screenshot about how you run it?

By the way, it is a python statement.

kellkwang commented 1 month ago

Could you post a screenshot about how you run it?

By the way, it is a python statement.

image

csukuangfj commented 1 month ago

Could you use

MonoCut.load_features(
    MonoCut(
        id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1",
        start=0.0,
        duration=6.0533125,
        channel=0,
        supervisions=[
            SupervisionSegment(
                id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1",
                recording_id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1",
                start=0.0,
                duration=6.0533125,
                channel=0,
                text="由市政府分管领导担任负责人",
                language="Chinese",
                speaker="KeSpeech_KeSpeech_000000352",
                gender=None,
                custom={"origin": "aidatatang_200zh"},
                alignment=None,
            )
        ],
        features=Features(
            type="kaldi-fbank",
            num_frames=605,
            num_features=80,
            frame_shift=0.01,
            sampling_rate=16000,
            start=0.0,
            duration=6.0533125,
            storage_type="lilcom_chunky",
            storage_path="data/fbank/kespeech_feats_train/feats-55.lca",
            storage_key="2370543709,43556,8961",
            recording_id="None",
            channels=0,
        ),
        recording=Recording(
            id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1",
            sources=[
                AudioSource(
                    type="file",
                    channels=[0],
                    source="KeSpeech/KeSpeech_000000352/1011480_9a450cb0.wav",
                )
            ],
            sampling_rate=16000,
            num_samples=96853,
            duration=6.0533125,
            channel_ids=[0],
            transforms=[{"name": "Speed", "kwargs": {"factor": 1.1}}],
        ),
        custom={"dataloading_info": {"rank": 6, "world_size": 8, "worker_id": None}},
    ),
    kwargs={},
)
kellkwang commented 1 month ago

Could you use

MonoCut.load_features(
    MonoCut(
        id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1",
        start=0.0,
        duration=6.0533125,
        channel=0,
        supervisions=[
            SupervisionSegment(
                id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1",
                recording_id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1",
                start=0.0,
                duration=6.0533125,
                channel=0,
                text="由市政府分管领导担任负责人",
                language="Chinese",
                speaker="KeSpeech_KeSpeech_000000352",
                gender=None,
                custom={"origin": "aidatatang_200zh"},
                alignment=None,
            )
        ],
        features=Features(
            type="kaldi-fbank",
            num_frames=605,
            num_features=80,
            frame_shift=0.01,
            sampling_rate=16000,
            start=0.0,
            duration=6.0533125,
            storage_type="lilcom_chunky",
            storage_path="data/fbank/kespeech_feats_train/feats-55.lca",
            storage_key="2370543709,43556,8961",
            recording_id="None",
            channels=0,
        ),
        recording=Recording(
            id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1",
            sources=[
                AudioSource(
                    type="file",
                    channels=[0],
                    source="KeSpeech/KeSpeech_000000352/1011480_9a450cb0.wav",
                )
            ],
            sampling_rate=16000,
            num_samples=96853,
            duration=6.0533125,
            channel_ids=[0],
            transforms=[{"name": "Speed", "kwargs": {"factor": 1.1}}],
        ),
        custom={"dataloading_info": {"rank": 6, "world_size": 8, "worker_id": None}},
    ),
    kwargs={},
)

I deleted a line,

kwargs={},

Because it caused an execution error. TypeError: load_features() got an unexpected keyword argument 'kwargs'

!/usr/bin/env python3

coding=UTF-8

from lhotse.cut import MonoCut from lhotse.supervision import SupervisionSegment from lhotse.features.base import Features from lhotse.audio import Recording, AudioSource

MonoCut.load_features( MonoCut( id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1", start=0.0, duration=6.0533125, channel=0, supervisions=[ SupervisionSegment( id="KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1",


        channel_ids=[0],
        transforms=[{"name": "Speed", "kwargs": {"factor": 1.1}}],
    ),
    custom={"dataloading_info": {"rank": 6, "world_size": 8, "worker_id": None}},
),
kwargs={},

)

The execution result is as follows: image

csukuangfj commented 1 month ago

could you use lilcom to read the file

data/fbank/kespeech_feats_train/feats-55.lca

please refer to the API of lilcom for how to read it.