BUG: Error with dataset_path parameter in vak eval

mpescaru commented 4 months ago

Description I am currently trying to go thought the vak tutorial and have gone through vak prep and vak train. I have changed the parameters in the .toml files exactly as instructed in the tutorial. However i am getting the following error when I try to run % gy6or6_eval.toml :

(vak-env) maria@MacBook-Pro tweetynet % vak eval gy6or6_eval.toml
Traceback (most recent call last):
  File "/opt/anaconda3/envs/vak-env/bin/vak", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/__main__.py", line 49, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/cli/cli.py", line 54, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/cli/cli.py", line 4, in eval
    eval(toml_path=toml_path)
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/cli/eval.py", line 30, in eval
    cfg = config.Config.from_toml_path(toml_path)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/config/config.py", line 183, in from_toml_path
    return cls.from_config_dict(config_dict, tables_to_parse, toml_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/config/config.py", line 141, in from_config_dict
    are_keys_valid(config_dict, table_name, toml_path)
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/config/validators.py", line 143, in are_keys_valid
    raise ValueError(err_msg)
ValueError: The following keys from 'eval' table in the config file 'gy6or6_eval.toml' are not valid:
{'dataset_path'}

This is my eval table in the configuration file:

# EVAL: options for evaluating a trained model. This is done using the "test" split.
[vak.eval]
# checkpoint_path: path to saved model checkpoint
checkpoint_path = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6/results_240722_120711/TweetyNet/checkpoints/max-val-acc-checkpoint.pt"
# labelmap_path: path to file that maps from outputs of model (integers) to text labels in annotations;
# this is used when generating predictions
labelmap_path = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6/results_240722_120711/labelmap.json"
# frames_standardizer_path: path to file containing SpectScaler that was fit to training set
# We want to transform the data we predict on in the exact same way
frames_standardizer_path = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6/results_240722_120711/FramesStandardizer"
# batch_size
# for predictions with a frame classification model, this should always be 1
# and will be ignored if it's not
batch_size = 11
# num_workers: number of workers to use when loading data with multiprocessing
num_workers = 16
# device: name of device to run model on, one of "cuda", "cpu"

# output_dir: directory where output should be saved, as a sub-directory within `output_dir`
output_dir = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6/eval"
# dataset_path : path to dataset created by prep
# ADD THE dataset_path OPTION FROM THE TRAIN FILE HERE (we already created a test split when we ran `vak prep` with that config)
dataset_path = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6/032212-vak-frame-classification-dataset-generated-240710_105255"

I copied "dataset_path" from the train toml file:

[vak.train]
# root_results_dir: directory where results should be saved, as a sub-directory within `root_results_dir`
root_results_dir = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6"
# batch_size: number of samples from dataset per batch fed into network
batch_size = 8
# num_epochs: number of training epochs, where an epoch is one iteration through all samples in training split
num_epochs = 2
# standardize_frames: if true, standardize (normalize) frames (input to neural network) per frequency bin, so mean of each is 0.0 and std is 1.0
# across the entire training split
standardize_frames = true
# val_step: step number on which to compute metrics with validation set, every time step % val_step == 0
# (a step is one batch fed through the network)
# saves a checkpoint if the monitored evaluation metric improves (which is model specific)
val_step = 400
# ckpt_step: step number on which to save a checkpoint (as a backup, regardless of validation metrics)
ckpt_step = 200
# patience: number of validation steps to wait before stopping training early
# if the monitored evaluation metrics does not improve after `patience` validation steps,
# then we stop training
patience = 4
# num_workers: number of workers to use when loading data with multiprocessing
num_workers = 4
# device: name of device to run model on, one of "cuda", "cpu"

# dataset_path : path to dataset created by prep. This will be added when you run `vak prep`, you don't have to add it

# dataset.params = parameters used for datasets
# for a frame classification model, we use dataset classes with a specific `window_size`
[vak.train.dataset]
path = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6/032212-vak-frame-classification-dataset-generated-240710_105255"

Things I have tried: Changing the nam eof the parameter to path, csv_path, dataset. I always got the same error. Running vak eval without the dataset_path parameter, I instead got this error:

(vak-env) maria@MacBook-Pro tweetynet % vak eval gy6or6_eval.toml
Traceback (most recent call last):
  File "/opt/anaconda3/envs/vak-env/bin/vak", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/__main__.py", line 49, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/cli/cli.py", line 54, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/cli/cli.py", line 4, in eval
    eval(toml_path=toml_path)
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/cli/eval.py", line 30, in eval
    cfg = config.Config.from_toml_path(toml_path)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/config/config.py", line 183, in from_toml_path
    return cls.from_config_dict(config_dict, tables_to_parse, toml_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/config/config.py", line 145, in from_config_dict
    ].from_config_dict(table_config_dict)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/config/eval.py", line 186, in from_config_dict
    config_dict["dataset"] = DatasetConfig.from_config_dict(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/vak-env/lib/python3.11/site-packages/vak/config/dataset.py", line 53, in from_config_dict
    return cls(**config_dict)
           ^^^^^^^^^^^^^^^^^^
TypeError: DatasetConfig.__init__() missing 1 required positional argument: 'path'

Desktop (please complete the following information):

Operating System: Mac, but have tried on Windows 11 and got the exact same errors
Version [e.g. 22]: vak version 1.0.0.post2

NickleDave commented 4 months ago

Hi @meriablue thank you for raising a detailed issue clearly explaining the problem you're having.
I'm sorry the docs aren't more clear.

I think what's going on here is that I failed to update some of the language in that section of the tutorial after I changed the config file format.

Instead of writing dataset_path in the [vak.eval] table, you want to make a table [vak.eval.dataset] (just like [vak.train.dataset]), and copy the key-value pair path = "/some/path".

So instead of having

[vak.eval]
# ... other key-value pairs
dataset_path = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6/032212-vak-frame-classification-dataset-generated-240710_105255"

you will want to make a table like this

[vak.eval.dataset]
path = "/Users/maria/Desktop/lab_work/tweetynet/gy6or6/032212-vak-frame-classification-dataset-generated-240710_105255"

In case it's not clear, you need to define the [vak.eval.dataset] sub-table after the [vak.eval] table, or else the TOML config parser will throw an error.

Can you please try that and let me know if it works?

If so, I will need to fix the example file (and change the language in that paragraph) https://github.com/vocalpy/vak/blob/main/doc/toml/gy6or6_eval.toml and specifically remove the comment that confused you. There's a similar comment in the train config https://github.com/vocalpy/vak/blob/f007f5c0f9e4a696479fc10962db82bec6d7cbe5/doc/toml/gy6or6_train.toml#L57 and in the predict config https://github.com/vocalpy/vak/blob/f007f5c0f9e4a696479fc10962db82bec6d7cbe5/doc/toml/gy6or6_predict.toml#L60 I should probably just put temporary [vak.command.dataset] tables in those files with temporary path = "/put/path/here" key-value pairs so that people don't have to figure out where they go.

mpescaru commented 4 months ago

Thank you for your quick response! I tried this and the command works now.

NickleDave commented 4 months ago

Great, glad to hear it! Thank you for letting me know.

Let's please leave this issue open for now and I will make some fixes to close it ASAP.

NickleDave commented 3 months ago

Just updated this page of the docs + the config files that go with it to hopefully make the new config format clearer

Thank you again @meriablue for catching this

NickleDave commented 3 months ago

@all-contributors please add @meriablue for doc

allcontributors[bot] commented 3 months ago

@NickleDave

I've put up a pull request to add @meriablue! :tada:

vocalpy / vak

BUG: Error with dataset_path parameter in vak eval #768