Open nikvaessen opened 2 years ago
I've also tried the soundfile
backend. Soundfile can read the .flac
file correctly from the stream, but it fails when we call info()
on the stream before load()
.
RuntimeError: Failed to open the input "StreamWrapper<<zipfile.ZipExtFile name='19-198-0000.flac' mode='r' compress_type=deflate>>" (Invalid data found when processing input).
Based on the traceback, I think it's about how does torchaudio
expect the input type. It would be easier for us to understand the functionality of tab.load
. Does it support loading inner file streams from tar
? cc: @mthrok
Regarding your comment about seekable, at least tar file stream should be seekable. So, I assume this won't be the root cause.
As a workaround, could you read data from the opened file stream directly before sending to tab.load
?
def audio_stream_to_tensor_and_meta(element):
path, stream = element
data = b"".join(stream)
meta = tab.info(data)
audio_tensor, sample_rate = tab.load(data)
return audio_tensor, meta
Thanks for your comment.
As a workaround, could you read data from the opened file stream directly before sending to tab.load?
Your code sample throws the following errors:
(for wav)
Traceback (most recent call last):
File "/home/nik/phd/repo/data_utility/playground/example.py", line 39, in <module>
for x in dp:
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/_typing.py", line 514, in wrap_generator
response = gen.send(None)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 116, in __iter__
yield self._apply_fn(data)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 81, in _apply_fn
return self.fn(data)
File "/home/nik/phd/repo/data_utility/playground/example.py", line 17, in audio_stream_to_tensor
audio_tensor, sample_rate = tab.load(data)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torchaudio/backend/sox_io_backend.py", line 227, in load
return _fallback_load(filepath, frame_offset, num_frames, normalize, channels_first, format)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torchaudio/io/_compat.py", line 97, in load_audio
s = torch.classes.torchaudio.ffmpeg_StreamReader(src, format, None)
RuntimeError
(for flac)
Traceback (most recent call last):
File "/home/nik/phd/repo/data_utility/playground/example.py", line 39, in <module>
for x in dp:
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/_typing.py", line 514, in wrap_generator
response = gen.send(None)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 116, in __iter__
yield self._apply_fn(data)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 81, in _apply_fn
return self.fn(data)
File "/home/nik/phd/repo/data_utility/playground/example.py", line 17, in audio_stream_to_tensor
audio_tensor, sample_rate = tab.load(data)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torchaudio/backend/sox_io_backend.py", line 227, in load
return _fallback_load(filepath, frame_offset, num_frames, normalize, channels_first, format)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torchaudio/io/_compat.py", line 97, in load_audio
s = torch.classes.torchaudio.ffmpeg_StreamReader(src, format, None)
RuntimeError: Failed to open the input "fLaC
This exception is thrown by __iter__ of MapperIterDataPipe(datapipe=TarArchiveLoaderIterDataPipe, fn=audio_stream_to_tensor, input_col=None, output_col=None)
However, simply using stream.seek(0)
between tab.info()
and tab.load()
solves the issue for both TarArchiveLoader
and ZipArchiveLoader
. It this something which is worth documenting?
Moreover, loading .flac
files remains an issue for the sox_io
backend. But I guess that now seems to be an issue related to torchaudio?
However, simply using
stream.seek(0)
betweentab.info()
andtab.load()
solves the issue for bothTarArchiveLoader
andZipArchiveLoader
. It this something which is worth documenting?
info
consumes some bytes from file-like object, so it calling load
after that would fail without reseting the position of the input file object.
Moreover, loading
.flac
files remains an issue for thesox_io
backend. But I guess that now seems to be an issue related to torchaudio?
There are reports filed recently on file-like object loading of FLAC format. I haven't looked into the detail yet, but meanwhile I think ffmpeg-based solution could work. Can you tell what happens if you replace load
function with torchaudio.io._compat.load_audio_fileobj
?
Replacing load
with torchaudio.io._compat.load_audio_fileobj
results in the flac stream correctly loading.
Similarly, replacing info
with torchaudio.io._compat.info_audio_fileobj(stream, format='flac')
results in the flac stream info loading.
AudioMetaData(sample_rate=16000, num_frames=0, num_channels=1, bits_per_sample=16, encoding=FLAC)
However, num_frames=0
is incorrect.
Using info(stream, format='flac')
does work, but also gives an error (and num_frames=0
is wrong):
def audio_stream_to_tensor_and_meta(element):
path, stream = element
meta = torchaudio.info(stream, format='flac')
stream.seek(0)
audio_tensor, sample_rate = torchaudio.io._compat.load_audio_fileobj(stream)
return audio_tensor, meta
formats: can't open input file `': FLAC ERROR whilst decoding metadata
tensor([[0.0044, 0.0033, 0.0031, ..., 0.0047, 0.0060, 0.0060]])
AudioMetaData(sample_rate=16000, num_frames=0, num_channels=1, bits_per_sample=16, encoding=FLAC)
Using only info(stream)
, without format="flac"
:
formats: can't open input file `': FLAC ERROR whilst decoding metadata
Traceback (most recent call last):
File "/home/nik/phd/repo/data_utility/playground/example.py", line 41, in <module>
for x in dp:
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/_typing.py", line 514, in wrap_generator
response = gen.send(None)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 116, in __iter__
yield self._apply_fn(data)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 81, in _apply_fn
return self.fn(data)
File "/home/nik/phd/repo/data_utility/playground/example.py", line 26, in audio_stream_to_tensor_and_meta
meta = torchaudio.info(stream)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torchaudio/backend/sox_io_backend.py", line 99, in info
return _fallback_info_fileobj(filepath, format)
File "/home/nik/phd/repo/data_utility/.venv/lib/python3.10/site-packages/torchaudio/io/_compat.py", line 35, in info_audio_fileobj
s = torchaudio._torchaudio_ffmpeg.StreamReaderFileObj(src, format, None, 4096)
RuntimeError: Failed to open the input "StreamWrapper<<ExFileObject name='./tar/flac.tar'>>" (Invalid data found when processing input).
This exception is thrown by __iter__ of MapperIterDataPipe(datapipe=TarArchiveLoaderIterDataPipe, fn=audio_stream_to_tensor_and_meta, input_col=None, output_col=None)
FFMPEG output of the file:
$ ffmpeg -i playground/file/19-198-0000.flac
...
Input #0, flac, from 'playground/file/19-198-0000.flac':
Duration: 00:00:01.97, start: 0.000000, bitrate: 177 kb/s
Stream #0:0: Audio: flac, 16000 Hz, mono, s16
Reading from the file directly:
torchaudio.info('19-198-0000.flac")
AudioMetaData(sample_rate=16000, num_frames=31440, num_channels=1, bits_per_sample=16, encoding=FLAC)
Maybe you can try thisstream.file_obj.read()
to get bytes:
def audio_stream_to_tensor_and_meta(element):
path, stream = element
stream = stream.file_obj.read()
...
return audio_tensor, meta
🐛 Describe the bug
I've been playing around with
torchdata
as a replacement for thewebdataset
library. My main use-case is reading data from network-attached file systems (such as ceph), which implies streaming from e.g..tar
files, which is somethingwebdataset
is designed for.In the following code I have the following relative file system: data.zip
Where each
.zip
or.tar
archive contains respectively the19-198-0000.flac
or19-198-0000.wav
file taken from the LibriSpeech dataset.From my reading of the documentation, this seams the easiest way to read from the archive:
This works :)! However, it fails when we try to read the
flac.tar
Similarly for
ZipArchiveLoader
, reading fromwav.zip
works, whileflac.zip
returns a similar error:Moreover, adding
torchaudio.info
to the map function also leads to the same issue for.wav
files:So I assume that the issues stem from the fact that the stream provided by
torchdata
is not seekable, or at least the buffer is not large enough?Versions
PyTorch version: 1.12.1+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31
Python version: 3.10.4 (main, Apr 20 2022, 11:26:44) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.15.0-46-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 11.5.119 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Nvidia driver version: 495.29.05 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.23.2 [pip3] torch==1.12.1 [pip3] torchaudio==0.12.1 [pip3] torchdata==0.4.1 [conda] Could not collect