Open NickleDave opened 1 year ago
Discussed this with @marisbasha and @yardencsGitHub today. Updating here with some thoughts I've had
VAEModel
. I would suggest that class look something like the VAExperiment
class here, https://github.com/AntixK/PyTorch-VAE/blob/master/experiment.py#L15 -- as far as training/validation step
encode
, decode
, and sample
, for all VAE models: https://github.com/AntixK/PyTorch-VAE/blob/master/models/base.pyAVA
that uses as the model family VAEModel
I don't think we need this for the initial implementation but noting for future work:
WindowDataset
for the Shotgun VAE models (which trains on randomly drawn windows)Tentative / rough to-do list for @marisbasha after our meeting today
nox -s test-data-generate
(this will may crash because I introduced a bug in the parametric UMAP branch--I might have fixed it by the time you start--but it will run enough for you to get the config files you need)vak prep tests/data_for_tests/configs/ConvEncoderUMAP_train_audio_cbin_annot_notmat.toml
vak.nn.loss.vae
forward
method instead of a factored out loss function, here -- I would prefer to implement a AVALoss
that verbatim repeats their logic, just to get closer to numerical reproducibility vak.models.vae_model.VAEModel
that defines the model family, using the vak.models.model_family
decorator and sub-classing vak.base.Model
. For an example see https://github.com/vocalpy/vak/blob
/a96ff976283ccdc34852fcf2ba5bb51808b6b25e/src/vak/models/frame_classification_model.py#L22training_step
and validation_step
with logic specific to datasets prepared by vakencode
and decode
vak.nets.ava
with architecture here (slightly refactored, e.g. with for loops?). for an example see https://github.com/vocalpy/vak/blob/main/src/vak/nets/tweetynet.py __init__
instead of using a separate method, and I would favor using for loops + separate encoder
+ decoder
attributes so that methods like encode
can just do return self.encoder(x)
vak.models.AVA
that defines the AVA
model, using @model(family='VAEModel')
) -- for an example see https://github.com/vocalpy/vak/blob/main/src/vak/models/tweetynet.py@NickleDave I am having trouble with nox -s test-data-generate
. I receive the following error:
NotADirectoryError: Path specified for ``data_dir`` not found: tests/data_for_tests/source/audio_cbin_annot_notmat/gy6or6/032312
Which after inspection I see that tests/data_for_tests/source/
is an empty directory. I checked in the code for gy6or6
, and I saw a script to download it. I put the data inside the audio_cbin_annot_notmat folder, but I get an error that says there's no .not.mat file in the directory, but I cannot find a link to download the data elsewhere.
Just to clarify, I should use my own "toy data" or does running vak prep tests/data_for_tests/configs/ConvEncoderUMAP_train_audio_cbin_annot_notmat.toml
generate "toy data"?
If that's the case, where should I download the data from?
Hey @marisbasha! Sorry you're running into this issue. It's probably something we haven't explained clearly enough.
If that's the case, where should I download the data from?
Just checking, did you already download the "source" test data as described here? https://vak.readthedocs.io/en/latest/development/contributors.html#download-test-data
To do that you would run
nox -s test-data-download-source
Just to clarify, I should use my own "toy data" or does running vak prep tests/data_for_tests/configs/ConvEncoderUMAP_train_audio_cbin_annot_notmat.toml generate "toy data"?
You are right that these are basically "toy" datasets, that are as small as possible. I tried to define the two different types in that section on the development set-up page but just in case it's not clear: the "source" data is inputs to vak, like audio and annotation files. You create the other type, the "generated" test data, when you run nox -s test-data-generate
. This "generated" test data consists of (small) prepared datasets and results, some of which are used by the unit tests.
You don't actually need to generate this test data to be able to develop. I just suggested it as a fairly painless way to check that you were able to set up the environment correctly. The script that generates the test data should be able to run to completion without any errors.
I am almost finished with that feature branch that will fix the unit tests so you can run them to test what you are developing. That branch will also speed up the script that generates the test data considerable and reduce the size of the generated test data. https://github.com/vocalpy/vak/pull/693
Does that help?
Everything fine now. Thanks!
🙌 awesome, glad to hear it!
Will ping you here as soon as I get that branched merged, it does fix a couple minor bugs so you'll probably want to git pull
them in along with the fixed tests
@NickleDave I have pushed again to my fork the parts divided by file. I am having trouble configuring the trainer. Could we have a brief discussion?
Ah whoops, sorry I missed this @marisbasha.
What you have so far looks great. I am reading through your code now to make sure I understand where you're at.
We can definitely discuss what to do with the trainer when we meet tomorrow.
https://autoencoded-vocal-analysis.readthedocs.io/en/latest/index.html https://elifesciences.org/articles/67855 https://github.com/pearsonlab/autoencoded-vocal-analysis/tree/master