wandb / examples

Example deep learning projects that use wandb's features.
http://wandb.ai
1.13k stars 291 forks source link

add: audiocraft spectrogram visualization enhancements #459

Closed mratanusarkar closed 1 year ago

mratanusarkar commented 1 year ago

Description:

Enhancing our spectrogram to better capture the spectral signature of generated audio, enabling easier identification of anomalies and a deeper understanding of the model's auditory output.

Having a good spectrogram tweaked to human auditory perception helps to correctly understand the sound signature in human reference, and in turn, understand the model outputs.

Changes:

  1. Applied logarithmic scaling for better sound perception alignment.
  2. Dynamic range set to 5th-95th percentiles for improved audio focus.
  3. Adjusted frequency range: Lower limit set at 20Hz; upper limit set dynamically based on audio energy concentration.
  4. Switched to 'magma' colormap for perceptual uniformity.
  5. Improved code structure for maintainability.

Impact:

Provides a clearer, more intuitive view of audio content, benefiting audio enthusiasts and general viewers.

To demonstrate, consider the following examples:

Future Scope:

If @wandb implements issue#6224, it would be cool to replace these spectrogram images with an "WandB Interactive Spectrograms"

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

github-actions[bot] commented 1 year ago

Thanks for contributing to wandb/examples! We appreciate your efforts in opening a PR for the examples repository. Our goal is to ensure a smooth and enjoyable experience for you 😎.

Guidelines

The examples repo is regularly tested against the ever-evolving ML stack. To facilitate our work, please adhere to the following guidelines:

Before merging, wait for a maintainer to clean and format the notebooks you're adding. You can tag @tcapelle.

Before marking the PR as ready for review, please run your notebook one more time. Restart the Colab and run all. We will provide you with links to open the Colabs below

The following colabs were changed -colabs/audiocraft/AudioCraft_MusicGen.ipynb

mratanusarkar commented 1 year ago

@mratanusarkar Thanks for the PR. Can you please change the 's to "?

@soumik12345 let me know if any more changes are required.

Also FYI, The png image size and quality can be improved from matplotlib side with:

fig, ax = plt.subplots(figsize=(10, 6))  # Adjust the numbers to your preference
...
plt.savefig(output_file, format='png', dpi=300, bbox_inches='tight', pad_inches=0) # add increased dpi

but I didn't include them to save image file size, and the current images look good enough in the wandb tables. users can change them if needed, for offline download of spectrogram images.

soumik12345 commented 1 year ago

The png image size and quality can be improved from matplotlib side with:

@mratanusarkar Can you please add a config for that?

mratanusarkar commented 1 year ago

The png image size and quality can be improved from matplotlib side with:

@mratanusarkar Can you please add a config for that?

Almost all the variables & parameters (15+) in get_spectrogram() could be added as config. But I feel that's unnecessary provided by the fact that it's a column data eventually.

The variable names are clear and as per DSP or Audio Engineering terms, and anyone interested in it can play around with the function.

Adding configs will make it more complex for most of the common use cases. But still, let me know your views.

soumik12345 commented 1 year ago

Thanks again for you contribution!