nttcslab / ToyADMOS2-dataset

ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions šŸš— šŸšƒ
https://arxiv.org/abs/2106.02369
14 stars 1 forks source link

Reading mp4 vs wav #1

Open gjkunde opened 1 year ago

gjkunde commented 1 year ago

I am attempting to read the new data set with the mp4 files, while this code snippet from mixer.py

sig, sr_sig = __audioread_load(filename, offset=0.0, duration=None, dtype=np.float32)

returns an array of values with length 242550 for the ToyAMOS1 wav files, it only returns the sample rate of 48,000 for the mp4 files but the length of sig is 0 and there is a warning warning:

/var/folders/mv/qbxkzz3d5zj4dh3wmt30cpfh000r_w/T/ipykernel_55465/1690306295.py:1: FutureWarning: librosa.core.audio.__audioread_load Deprecated as of librosa version 0.10.0. It will be removed in librosa version 1.0.

noboru2000 commented 1 year ago

@gjkunde Thank you for your report.

Iā€™m not sure if this is caused by librosa but I remember that some versions of the FFMPEG decoder for the MPEG-4 ALS had a bug decoding it.

Could you please try to extract the mp4 file using the official MPEG-4 ALS decoder? You can download the reference software of the MPEG-4 ALS from the following ISO/IEC link: https://standards.iso.org/iso-iec/14496/-26/ed-2/en/confTools.zip

The source code in the mp4alsRM25.zip is a reference software for MPEG-4 Audio Lossless Coding. Note that mp4alsRM25sp.zip is for the simple profile that does not contain codes for supporting 32-bit float. This reference software of MPEG-4 ALS can extract the mp4 file encoded with the MPEG-4 ALS.

daisukelab commented 1 year ago

Hi @gjkunde,

Thank you for your interest. I tried to reproduce the issue and partially could. In short, please try downgrading your librosa to 0.9.2 or older, which could solve your issue.

Thanks! (Of course, you can try what Noboru suggested. It would show more details about the .mp4 encoding.)

The followings are the logs that I tried.

>>> import numpy as np
>>> import librosa
>>> librosa.__version__
'0.10.0.post2'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/hdd/datasets/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
<stdin>:1: FutureWarning: librosa.core.audio.__audioread_load
        Deprecated as of librosa version 0.10.0.
        It will be removed in librosa version 1.0.
>>> len(sig)
576000

The older versions are fine.

>>> import numpy as np

>>> import librosa
>>> librosa.__version__
'0.8.1'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/lab/data/toy21/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
>>> len(sig)
576000

>>> import librosa
>>> librosa.__version__
'0.9.2'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/hdd/datasets/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
>>> len(sig)
576000