orcasound / orca-action-workflow

Github actions to automate Orcasound tasks.
MIT License
7 stars 6 forks source link

Try different codecs for Orcasound #19

Open Molkree opened 3 years ago

Molkree commented 3 years ago

Streaming in flac

Right now aac seems to cut off frequencies above 16-17 kHz: image

So flac (or even some tweaked aac?) will provide more data. No nodes stream in flac at the moment but over the summer we can try that and see if it's significantly better.

Molkree commented 3 years ago

Related discussion from Slack

@wetdog:

I see that you're calculating the spectrogram directly on the data that comes from wavfile.read method, I tested and the specgram function of matplotlib does produces the same spectrogtam images, but the scale of the data is different as wavfile.read outputs int16, or int24 data and it's common to scale waveforms to -1,1 range.

@Molkree:

yeah, it's int16 because ffmpeg converts to pcm_s16le

Input #0, mpegts, from 'bush_point/live999.ts':
Duration: 00:00:09.92, start: 9991.417211, bitrate: 149 kb/s
Program 1 
Metadata:
service_name    : Service01
service_provider: FFmpeg
Stream #0:0[0x100]: Audio: aac (LC) ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp, 124 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))

I haven't put much thought into ffmpeg settings when converting :thinking_face: So maybe specifying f32le would increase quality and it will be in float -1,1 range like you said I just assumed that ffmpeg would choose the same/best as incoming .ts but it used "The default for muxing into WAV files is pcm_s16le" I just don't quite understand what the incoming audio bitness is, it says fltp (which stands for Planar Floating point format that suggests floating point) and aac (LC)/aac (native)

So again to the question if it's even worth it to use more bits cause I don't know the incoming source I've looked at the .ts file through MPC-HC Properties and it lists this:

Audio: PCM 48000Hz stereo 1536kbps [A: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s]
Audio: IEEE Float 48000Hz stereo 3072kbps [A: pcm_f32le, 48000 Hz, stereo, fp32, 3072 kb/s]
Audio: AAC 48000Hz stereo 130kbps [A: aac lc, 48000 Hz, stereo, 130 kb/s]

Both pcm_s16le and pcm_f32le so I'm confused

I've tried converting .ts to pcm_f32le .wav here's the comparison: .ts -- 181 KB .wav pcm_s16le -- 1877 KB .wav pcm_f32le -- 3753 KB

int16 spectrogram live000_int16_spectrogram

float32 spectrogram live000_float32_spectrogram

@scottveirs:

FWIW each of the current nodes is using the Pisound ADC to sample the hydrophones at 48kHz and 24 bits, usually in stereo, i.e. two channels from two nearby hydrophones. The ffmpeg command running on each node is something close to:

ffmpeg -f jack -i ffjack -f segment -segment_list "/tmp/$NODE_NAME/hls/$timestamp/live.m3u8" -segment_list_flags +live -segment_time $SEGMENT_DURATION -segment_format mpegts -ar $STREAM_RATE -ac $CHANNELS -threads 3 -acodec aac "/tmp/$NODE_NAME/hls/$timestamp/live%03d.ts"

Note the -acodec aac part which I think suggests we have been using (without much forethought) whatever ffmpeg considers "defaults" for aac encoding -- https://trac.ffmpeg.org/wiki/Encode/AAC

Confirming via remote login to each Rpi just now that Orcasound Lab, Bush Point, and Port Townsend all have STREAM_RATE=48000. They are all sampling the hydrophone signals at 48,000 samples/second, though Orcasound Lab is running a slightly more recent branch of orcanode using Jack that we know could be pushed up to a sample rate of 192,000 (e.g. for experiments in higher-resolution sampling this summer).

@Molkree:

I can't see any difference at all between int16/float32 in Sox (it was noticeable with previous method) But if Scott said that hydrophones sample at 24 bits maybe s24le would be enough anyway