Detection of faint vocalizations

If the spectrogram looks any different between the int16 and the float32 versions it sounds like the spectrograms are computed differently.

There are a lot of parameters to play around with, which have you tried changing?

min_level_db {int} -- default dB minimum of spectrogram (threshold anything below) (default: {-80}) min_level_db_floor {int} -- highest number min_level_db is allowed to reach dynamically (default: {-40}) db_delta {int} -- delta in setting min_level_db (default: {5}) n_fft {int} -- FFT window size (default: {1024}) hop_length_ms {int} -- number audio of frames in ms between STFT columns (default: {1}) win_length_ms {int} -- size of fft window (ms) (default: {5}) ref_level_db {int} -- reference level dB of audio (default: {20}) pre {float} -- coefficient for preemphasis filter (default: {0.97}) spectral_range {[type]} -- spectral range to care about for spectrogram (default: {None}) verbose {bool} -- display output (default: {False}) mask_thresh_std {int} -- standard deviations above median to threshold out noise (higher = threshold more noise) (default: {1}) neighborhood_time_ms {int} -- size in time of neighborhood-continuity filter (default: {5}) neighborhood_freq_hz {int} -- size in Hz of neighborhood-continuity filter (default: {500}) neighborhood_thresh {float} -- threshold number of neighborhood time-frequency bins above 0 to consider a bin not noise (default: {0.5}) min_syllable_length_s {float} -- shortest expected length of syllable (default: {0.1}) min_silence_for_spec {float} -- shortest expected length of silence in a song (used to set dynamic threshold) (default: {0.1}) silence_threshold {float} -- threshold for spectrogram to consider noise as silence (default: {0.05}) max_vocal_for_spec {float} -- longest expected vocalization in seconds (default: {1.0}) temporal_neighbor_merge_distance_ms {float} -- longest distance at which two elements should be considered one (default: {0.0}) overlapping_element_merge_thresh {float} -- proportion of temporal overlap to consider two elements one (default: {np.inf}) min_element_size_ms_hz {list} -- smallest expected element size (in ms and HZ). Everything smaller is removed. (default: {[0, 0]}) figsize {tuple} -- size of figure for displaying output (default: {(20, 5)})

timsainb / vocalization-segmentation

Detection of faint vocalizations #7