The bitdepth of the output mixtures should be a setting, like sample rate, that you can enforce in the Scaper object. Right now, the behavior seems to be to take the bitdepth of the input source files which can vary greatly (I believe they vary in UrbanSound 8k, for example). When the bitdepth of the output varies or is too big, you get really poor performance when loading the mixtures for processing by something like a deep net. I think the fix is simple, it's somewhere in sox you can enforce the bit depth to a default like 16 or something.
The bitdepth of the output mixtures should be a setting, like sample rate, that you can enforce in the Scaper object. Right now, the behavior seems to be to take the bitdepth of the input source files which can vary greatly (I believe they vary in UrbanSound 8k, for example). When the bitdepth of the output varies or is too big, you get really poor performance when loading the mixtures for processing by something like a deep net. I think the fix is simple, it's somewhere in
sox
you can enforce the bit depth to a default like 16 or something.