Open Molkree opened 3 years ago
Related discussion from Slack
@wetdog:
I see that you're calculating the spectrogram directly on the data that comes from wavfile.read method, I tested and the specgram function of matplotlib does produces the same spectrogtam images, but the scale of the data is different as wavfile.read outputs int16, or int24 data and it's common to scale waveforms to -1,1 range.
@Molkree:
yeah, it's
int16
because ffmpeg converts topcm_s16le
Input #0, mpegts, from 'bush_point/live999.ts': Duration: 00:00:09.92, start: 9991.417211, bitrate: 149 kb/s Program 1 Metadata: service_name : Service01 service_provider: FFmpeg Stream #0:0[0x100]: Audio: aac (LC) ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 124 kb/s Stream mapping: Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
I haven't put much thought into ffmpeg settings when converting :thinking_face: So maybe specifying
f32le
would increase quality and it will be in float -1,1 range like you said I just assumed that ffmpeg would choose the same/best as incoming .ts but it used "The default for muxing into WAV files ispcm_s16le
" I just don't quite understand what the incoming audio bitness is, it saysfltp
(which stands for Planar Floating point format that suggests floating point) andaac (LC)
/aac (native)
So again to the question if it's even worth it to use more bits cause I don't know the incoming source I've looked at the .ts file through MPC-HC Properties and it lists this:
Audio: PCM 48000Hz stereo 1536kbps [A: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s] Audio: IEEE Float 48000Hz stereo 3072kbps [A: pcm_f32le, 48000 Hz, stereo, fp32, 3072 kb/s] Audio: AAC 48000Hz stereo 130kbps [A: aac lc, 48000 Hz, stereo, 130 kb/s]
Both
pcm_s16le
andpcm_f32le
so I'm confusedI've tried converting .ts to pcm_f32le .wav here's the comparison: .ts -- 181 KB .wav
pcm_s16le
-- 1877 KB .wavpcm_f32le
-- 3753 KBint16 spectrogram
float32 spectrogram
@scottveirs:
FWIW each of the current nodes is using the Pisound ADC to sample the hydrophones at 48kHz and 24 bits, usually in stereo, i.e. two channels from two nearby hydrophones. The ffmpeg command running on each node is something close to:
ffmpeg -f jack -i ffjack -f segment -segment_list "/tmp/$NODE_NAME/hls/$timestamp/live.m3u8" -segment_list_flags +live -segment_time $SEGMENT_DURATION -segment_format mpegts -ar $STREAM_RATE -ac $CHANNELS -threads 3 -acodec aac "/tmp/$NODE_NAME/hls/$timestamp/live%03d.ts"
Note the -acodec aac part which I think suggests we have been using (without much forethought) whatever ffmpeg considers "defaults" for aac encoding -- https://trac.ffmpeg.org/wiki/Encode/AAC
Confirming via remote login to each Rpi just now that Orcasound Lab, Bush Point, and Port Townsend all have STREAM_RATE=48000. They are all sampling the hydrophone signals at 48,000 samples/second, though Orcasound Lab is running a slightly more recent branch of orcanode using Jack that we know could be pushed up to a sample rate of 192,000 (e.g. for experiments in higher-resolution sampling this summer).
@Molkree:
I can't see any difference at all between int16/float32 in Sox (it was noticeable with previous method) But if Scott said that hydrophones sample at 24 bits maybe
s24le
would be enough anyway
Streaming in flac
Right now aac seems to cut off frequencies above 16-17 kHz:
So flac (or even some tweaked aac?) will provide more data. No nodes stream in flac at the moment but over the summer we can try that and see if it's significantly better.