lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
956 stars 219 forks source link

How to save a cut with the `to_mono` operation? #1417

Closed pengzhendong closed 6 days ago

pengzhendong commented 1 week ago

I want to do the cut.to_mono(mono_downmix=True) on the fly (lazy) without saving the downmixed recording file. However, it could not save the downmixed recording object to the cut jsonl.

monocut = multicut.to_mono(mono_downmix=True)
CutSet.from_cuts([monocut]).to_jsonl("tmp.jsonl.gz")
File "/usr/local/lib/python3.10/dist-packages/lhotse/serialization.py", line 349, in to_jsonl
  save_to_jsonl(self.to_dicts(), path)
File "/usr/local/lib/python3.10/dist-packages/lhotse/serialization.py", line 175, in save_to_jsonl
  print(json.dumps(item, ensure_ascii=False), file=f)
File "/usr/lib/python3.10/json/__init__.py", line 238, in dumps
  **kw).encode(obj)
File "/usr/lib/python3.10/json/encoder.py", line 199, in encode
  chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode
  return _iterencode(o, 0)
File "/usr/lib/python3.10/json/encoder.py", line 179, in default
  raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable
pzelasko commented 1 week ago

Downmixing is an eager operation unlike most other ops in Lhotse. You'd need to call .save_audio() before .to_jsonl() to materialize the downmixed audio on disk. Otherwise the JSONL metadata has no way to reference the downmixed audio.