Open 900miles opened 3 months ago
So the way that the tutorials should work I think might be different than the way that you're using them. I would recommend testing this again once we have that tutorial merged into mainline as I think that would allow us to see if the "Open in Colab" button works more as expected. Another fix for what you're showing is to not reference these by relative file paths and rather use wget or something to locally download them onto colab from the github.
@wilke0818 nice catch! are you suggesting including a piece of code for downloading some audio as part of the tutorials? Something like this (https://github.com/sensein/fab/blob/main/tutorials/voice_anonymization/voice_anonymization.ipynb):
# This variable holds the web address from which we'll download the EmoDB dataset.
# It's like a treasure map guiding us to the wonderful voice recordings!
dataset_url = "http://emodb.bilderbar.info/download/download.zip"
# The data_folder variable points to the location where we'll store all the data and audio recordings.
# Think of it as our backstage area, well-organized and ready to showcase the talents of our voices!
data_folder = "./data/"
# The dataset_name variable will be the name we give to the EmoDB dataset once we download it.
# Just a friendly label to recognize it easily when we work with it later on.
dataset_name = "emodb_dataset"
%%bash
# This bash script checks if the EmoDB dataset has already been downloaded.
# If the dataset folder exists, it means the dataset is already downloaded.
# Otherwise, it proceeds with the download process.
if [ -d "$dataset_path" ]; then
# The dataset folder exists, so the dataset is already downloaded.
echo "$dataset_name already downloaded in $dataset_path."
else
# The dataset folder does not exist, indicating the dataset needs to be downloaded.
echo "Downloading..."
# Create the dataset folder and its parent directories, if they don't exist.
mkdir -p "$dataset_path"
# Use the 'wget' command to fetch the EmoDB dataset from the provided URL ($dataset_url).
# Save the downloaded file as "$dataset_name.zip" in the "$dataset_path" folder.
wget -O "$dataset_path"/"$dataset_name".zip "$dataset_url"
# Unzip the downloaded dataset file ($dataset_name.zip) into the "$dataset_path" folder.
# The '-d' option specifies the destination directory for the extracted files.
unzip "$dataset_path"/"$dataset_name".zip -d "$dataset_path"
# Remove the downloaded zip file, as we don't need it anymore.
rm "$dataset_path"/"$dataset_name".zip
fi
Yeah I mean technically it could be anything. I was thinking (not certain this would work) in colab
!wget https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav
test_audio_path = './audio_48khz_mono_16bits.wav'
yours works if you want an entire dataset, though at that point it might be better to do something like use HuggingFace and convert it to a SenselabDataset which is the approach I was using in the ser tutorial.
and also my note to Miles above was that it is possible that the code you have will work, once it is pulled into mainline. I have found that when working with the notebooks, there is a weird sort of Github/Colab interaction where Colab tries to use notebooks from the main branch of Github.
and also my note to Miles above was that it is possible that the code you have will work, once it is pulled into mainline. I have found that when working with the notebooks, there is a weird sort of Github/Colab interaction where Colab tries to use notebooks from the main branch of Github.
oh wow!
Yeah I mean technically it could be anything. I was thinking (not certain this would work) in colab
!wget https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav test_audio_path = './audio_48khz_mono_16bits.wav'
I agree that for most cases having one or two files is more than enough
So the way that the tutorials should work I think might be different than the way that you're using them. I would recommend testing this again once we have that tutorial merged into mainline as I think that would allow us to see if the "Open in Colab" button works more as expected. Another fix for what you're showing is to not reference these by relative file paths and rather use wget or something to locally download them onto colab from the github.
what branch/PR are you referring to?
and also my note to Miles above was that it is possible that the code you have will work, once it is pulled into mainline. I have found that when working with the notebooks, there is a weird sort of Github/Colab interaction where Colab tries to use notebooks from the main branch of Github.
I'm not exactly sure what you mean by this, as speech_to_text.ipynb
is in the mainline branch, no? Note that this doesn't just affect that tutorial, but any tutorial with relative imports. Including getting_started.ipynb
which is also in mainline.
I just re-tested it and I see that I was mistaken (I thought speech_to_text.ipynb
was still in a PR). And yeah, you either need to have something like !pip install senselab
(or a variation that specifies the Github and a branch). We also then need to do as mentioned earlier and download the files from source, suggest the user uploads their own file, or use HuggingFace. Also @900miles the !pip install senselab
is just commented out in this file though it is missing in others.
how about differentiating the flow on colab
from the local flow?
def is_colab():
try:
import google.colab
return True
except ImportError:
return False
# Example usage
if is_colab():
# download the files of interest from the github link
# change to test_audio_path accordingly
else:
# set up the test_audio_path
Gotcha. I wasn't sure if the missing !pip install senselab
on the tutorials was intentional or not (I can see an argument that if you're going through the module-level tutorials, you've already installed senselab). I think the wget is a good idea, or huggingface. We could also do like scipy does and have some sort of test_audios module
@fabiocat93 not sure how much sense it makes to differentiate the two cases. the "local flow" I guess only really effects those that are running this after setting everything up for development which probably will not be most people. I feel like the tutorial shouldn't assume anything regarding. Also in both cases it seems like we need to have pip install senselab
or something equivalent as even if you clone the repo, the importing isn't setup for running through with a notebook
@900miles, can you handle this issue in all the tutorials? You may create a utility for downloading an existing dataset to be processed. I have tentatively assigned this to you
Description
Any tutorial that imports test audio files (e.g.
Audio.from_filepath("../src/tests/data_for_testing/audio_48khz_mono_16bits.wav")
) do not work on Google Colab, as there is no audio file to load in that environment. This affects most if not all of the tutorials that we currently have.Steps to Reproduce
Open a notebook tutorial, for example
speech_to_text.ipynb
. Add!pip install senselab
to the top of the file, and then run.Expected Results
The tutorial runs as expected.
Actual Results
When running the following code block:
I get the following error:
Additional Notes
No response