move normalization into the hubert model. brings back part of 77ee0f4. (realized the approach discussed in #7 is not a good idea)
simplify data processing and remove redundant operations
changes to the way semantic tokens are computed in preprocessing: MERT Is trained on a context window of 5 seconds (!) which might explain the deterioration in sample quality over longer sequences noticed by @Saltb0xApps in the discord confirmed by the m-a-p team that MERT generalizes to longer context lengths, and still holds SOTA performance on various tasks. Will leave the option to specify a shorter hubert context length, but won't be used if not specified in config.
bin_size option to average adjacent hubert features to reduce the number of semantic tokens
MERT Is trained on a context window of 5 seconds (!) which might explain the deterioration in sample quality over longer sequences noticed by @Saltb0xApps in the discordconfirmed by the m-a-p team that MERT generalizes to longer context lengths, and still holds SOTA performance on various tasks. Will leave the option to specify a shorter hubert context length, but won't be used if not specified in config.bin_size
option to average adjacent hubert features to reduce the number of semantic tokens