Open gjkunde opened 1 year ago
@gjkunde Thank you for your report.
Iām not sure if this is caused by librosa but I remember that some versions of the FFMPEG decoder for the MPEG-4 ALS had a bug decoding it.
Could you please try to extract the mp4 file using the official MPEG-4 ALS decoder? You can download the reference software of the MPEG-4 ALS from the following ISO/IEC link: https://standards.iso.org/iso-iec/14496/-26/ed-2/en/confTools.zip
The source code in the mp4alsRM25.zip is a reference software for MPEG-4 Audio Lossless Coding. Note that mp4alsRM25sp.zip is for the simple profile that does not contain codes for supporting 32-bit float. This reference software of MPEG-4 ALS can extract the mp4 file encoded with the MPEG-4 ALS.
Hi @gjkunde,
Thank you for your interest. I tried to reproduce the issue and partially could. In short, please try downgrading your librosa to 0.9.2 or older, which could solve your issue.
Thanks! (Of course, you can try what Noboru suggested. It would show more details about the .mp4 encoding.)
The followings are the logs that I tried.
>>> import numpy as np
>>> import librosa
>>> librosa.__version__
'0.10.0.post2'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/hdd/datasets/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
<stdin>:1: FutureWarning: librosa.core.audio.__audioread_load
Deprecated as of librosa version 0.10.0.
It will be removed in librosa version 1.0.
>>> len(sig)
576000
The older versions are fine.
>>> import numpy as np
>>> import librosa
>>> librosa.__version__
'0.8.1'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/lab/data/toy21/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
>>> len(sig)
576000
>>> import librosa
>>> librosa.__version__
'0.9.2'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/hdd/datasets/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
>>> len(sig)
576000
I am attempting to read the new data set with the mp4 files, while this code snippet from mixer.py
sig, sr_sig = __audioread_load(filename, offset=0.0, duration=None, dtype=np.float32)
returns an array of values with length 242550 for the ToyAMOS1 wav files, it only returns the sample rate of 48,000 for the mp4 files but the length of sig is 0 and there is a warning warning:
/var/folders/mv/qbxkzz3d5zj4dh3wmt30cpfh000r_w/T/ipykernel_55465/1690306295.py:1: FutureWarning: librosa.core.audio.__audioread_load Deprecated as of librosa version 0.10.0. It will be removed in librosa version 1.0.