Open auroracramer opened 4 years ago
Can repro. I added some stuff to the script to make it a bit more obvious if anyone else is having a go at this issue:
import scaper
import soundfile as sf
import jams
import numpy as np
jams_file = './soundscape.jams'
scaper.generate_from_jams(jams_file, './soundscape.wav', save_isolated_events=True)
jam = jams.load(jams_file)
ann = jam.annotations.search(namespace='scaper')[0]
soundscape_audio, sr = sf.read(ann.sandbox.scaper['soundscape_audio_path'])
isolated_event_audio_paths = ann.sandbox.scaper['isolated_events_audio_path']
isolated_audio = []
for event_spec, event_audio in zip(ann, isolated_event_audio_paths):
# event_spec contains the event description, label, etc
# event_audio contains the path to the actual audio
# make sure the path matches the event description
assert event_spec.value['role'] in event_audio
assert event_spec.value['label'] in event_audio
isolated_audio.append(sf.read(event_audio)[0])
# the sum of the isolated audio should sum to the soundscape
assert np.allclose(sum(isolated_audio), soundscape_audio)
This results in:
Traceback (most recent call last):
File "mwe.py", line 25, in <module>
assert sum(isolated_audio) == soundscape_audio
ValueError: operands could not be broadcast together with shapes (64000,) (64768,)
Looking into this now.
It seems that type of audio file for the Jackhammer is different from all of the others in an adversarial fashion.
before ./soundscape_events/foreground12_jackhammer.wav
samplerate: 16000 Hz
channels: 1
duration: 4.031 s
format: WAV (Microsoft) [WAV]
subtype: Microsoft ADPCM [MS_ADPCM] ./soundscape_events/foreground12_jackhammer.wav (64500,)
after ./soundscape_events/foreground12_jackhammer.wav
samplerate: 16000 Hz
channels: 1
duration: 4.031 s
format: WAV (Microsoft) [WAV]
subtype: Microsoft ADPCM [MS_ADPCM] ./soundscape_events/foreground12_jackhammer.wav (64000,)
readback ./soundscape_events/foreground12_jackhammer.wav
samplerate: 16000 Hz
channels: 1
duration: 4.031 s
format: WAV (Microsoft) [WAV]
subtype: Microsoft ADPCM [MS_ADPCM] ./soundscape_events/foreground12_jackhammer.wav (64768,)
Here I'm just printing out the audio_info
, and shapes inside match_sample_length
. Here's what this same printout looks like for a file that has no anomalies:
before ./soundscape_events/foreground13_gun_shot.wav
samplerate: 16000 Hz
channels: 1
duration: 4.000 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16] ./soundscape_events/foreground13_gun_shot.wav (64000,)
after ./soundscape_events/foreground13_gun_shot.wav
samplerate: 16000 Hz
channels: 1
duration: 4.000 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16] ./soundscape_events/foreground13_gun_shot.wav (64000,)
readback ./soundscape_events/foreground13_gun_shot.wav
samplerate: 16000 Hz
channels: 1
duration: 4.000 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16] ./soundscape_events/foreground13_gun_shot.wav (64000,)
The bug seems to literally be here:
The shape of that audio data array is 64000 when writing to the audio file. But then when we write it to disk using soundfile
it suddenly becomes 64768. Bizarre!
So I literally just tried running this on that specific audio file (I moved it first two directories up and then ffmpeg it back down):
ffmpeg -i ../../88466-7-0-0.wav 88466-7-0-0.wav
I tried just fixing the source audio file prior to feeding it into Scaper using the statement above.. The output for my print statements now looks like this:
before ./soundscape_events/foreground12_jackhammer.wav
samplerate: 16000 Hz
channels: 1
duration: 4.000 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16] ./soundscape_events/foreground12_jackhammer.wav (64000,)
after ./soundscape_events/foreground12_jackhammer.wav
samplerate: 16000 Hz
channels: 1
duration: 4.000 s
format: WAV (Microsoft) [WAV]
subtype: Signed 16 bit PCM [PCM_16] ./soundscape_events/foreground12_jackhammer.wav (64000,)
./soundscape_events/foreground12_jackhammer.wav: adjusted from 64000 to 64000
and the script I pasted above passes for the exact same JAMS file (but with a fixed 88466-7-0-0.wav file).
Also the file size for the WAV file changed from 37kb to 147kb on my machine. I think how Scaper, Sox, and Soundfile interact with all the different types of sound files needs some more investigation.
So my recommendation right now: normalize all of your data via ffmpeg first so that every single audio file you're trying to mix has the same sample rate, is a .wav
file, and has the subtype Signed 16 bit PCM
. We should perhaps find subtypes that don't work and throw a warning.
Tiny demo showing that it's in Soundfile:
import soundfile as sf
import numpy as np
import tempfile
_format = 'WAV'
_subtype = 'MS_ADPCM'
sr = 16000
for l in [1600, 16000, 32000, 64000, 100000, 128000]:
original_shape = (l,)
audio = np.random.randn(*original_shape)
with tempfile.NamedTemporaryFile(suffix='.wav', delete=True) as tmpfile:
sf.write(tmpfile.name, audio, sr, subtype=_subtype, format=_format)
audio, sr = sf.read(tmpfile.name)
print('What I got back from sf.read\t', audio.shape)
print('What I meant to write to disk\t', original_shape)
print()
Output here is:
What I got back from sf.read (2024,)
What I meant to write to disk (1600,)
What I got back from sf.read (16192,)
What I meant to write to disk (16000,)
What I got back from sf.read (32384,)
What I meant to write to disk (32000,)
What I got back from sf.read (64768,)
What I meant to write to disk (64000,)
What I got back from sf.read (100188,)
What I meant to write to disk (100000,)
What I got back from sf.read (128524,)
What I meant to write to disk (128000,)
If you look at the current test for match_sample_length
, you can see the formats and subtypes that we actually test on (it's not all of them):
So while WAV
is in there, MS_ADPCM
is not one of the subtypes we test the function on. So I guess make sure your source audio going into Scaper is in those tested lists for now.
After some offline discussion with @jtcramer, I wanted to note for the record that the anomalous files described here are in UrbanSound8k. So if you are using UrbanSound8k and Scaper for source separation purposes, you should make everything uniform before passing it into Scaper, otherwise you'll have some (small) number of bad files at the moment.
thanks @jtcramer for raising and thanks @pseeth for the great analysis!
If the problem is in soundfile then indeed there's little we can do to actually address, but I agree we should issue a warning. We should also add a paragraph to the source material section of the tutorial (right at the beginning) to recommend that people pre-process their material before feeding it into scaper.
@pseeth u PR, i CR?
Yeah, I'll put in a PR. Can we discuss where the warning should go? Should it go into match_sample_length
? It could be predictive, where we check the subtype and format and see if it's in the accepted_subtypes
and accepted_formats
list (those could move out of the tests and into core
as constants). So that check could go at the top of this function after we get audio_info
:
Or, we could add stuff at the end of the function to read back the "matched" audio file after writing it. So one more line after this would be
readback_audio, sr = sf.read(audio_path)
if readback_audio.shape[0] != duration_in_samples:
warnings.warn(ScaperWarning, "blahblahblah"
This would have the cost of reading back the file every time, though. But it would be format/subtype agnostic.
I wonder if something more complete would be to scan the FG/BG library upon initialization of the scaper object, then issue a warning if there are files that aren't .wav or 16bit etc.
Warning: Scaper has found source audio files whose format is different from Signed 16 bit PCM WAV. In rare occasions this could lead to saved isolated events containing an incorrect number of samples due to a known issue with pysoundfile. The recommended course of action is to pre-process your source audio so that all files are signed 16 bit PCM wav using scaper.preprocess_source_material()
We can implement the preprocess function to make life easy for people, assuming scanning the source library can be done efficiently.
On the other hand if this issue only impacts separated sources, we would end up issuing this warning a lot also for people who won't be affected by it. Not sure what's the best option, minimalism vs completeness.
Should we do this at init or in a separate function? scaper.check_source_material(fg_directory, bg_directory)
.
One issue I see with doing it at init is that when I'm generating soundscapes in parallel, I have a new Scaper object for every scene that I use to generate. Each Scaper object is given a different random seed. So I think that checking at init would cause every Scaper object (often 20k of them) to check the same foreground/background library over and over.
Two options:
Thoughts?
Inside init with the scan disabled sounds like best of both worlds. I think the actual scan should also be a top-level function - not within the Scaper object, but rather at scaper.scan_source_material
and scaper.preprocess_source_material
. The Scaper object can then call the scan function and recommend the user do the preprocess function.
I think the actual scan should also be a top-level function - not within the Scaper object, but rather at scaper.scan_source_material and scaper.preprocess_source_material. The Scaper object can then call the scan function and recommend the user do the preprocess function.
Agreed.
Inside init with the scan disabled sounds like best of both worlds.
Not sure about this thought. If it's hidden inside the Scaper init but disabled by default, it'll most likely get ignored by the majority of users. It might be preferable to have it enabled by default and issue a message when it runs along the lines of:
Scanning source material for audio format problems. You can disable this scan by setting scaper.Scaper(..., scan_source_material=False). You can execute the scan manually via scaper.scan_source_material(fg_path, bg_path).
?
(this way the user always learns about it and can disable in their own code if they don't want it), or, and perhaps this is the better option, we don't run it by default but we explain and encourage users to do so in the tutorial? We'd have to update all examples (including on the repo README).
Occasionally, when generating mixtures using only foreground events (no background) and saving the isolated events to disk, one of audio files for the isolated sources is slightly longer than the target soundscape duration. In this specific case, the target soundscape was to be 4 seconds at 16kHz (i.e. 64,000 samples). The anomalous isolated source file had 64,768 samples. In the original context I found this issue, I generated 20,000 soundscapes and 89 of them exhibited this behavior. In each case, only exactly one isolated source exhibited this issue. These audio files always had exactly 64,768 samples.
I've included a reproducible example in an attached zip.
scaper-debug.zip
Within the (unzipped) directory, the following can be run to replicate the issue:
./soundscape_events/foreground12_jackhammer.wav
should be the anomalous file here.I'm using Python 3.6.7 (on Ubuntu 18.04.4 LTS), and using the scaper version at commit d0431ec7b091d49709dfce25d149f6dfd0e982c8.