sample error: KeyError: 'test'

Ivvvvvvvvvvy commented 1 year ago

hello, I'm sampling, but I'm running into something I can't figure out, it seems to be in the "SPLITS="\"[test, ]\""" part.

The script crashes with the following [error:]

Traceback (most recent call last):
  File "evaluation/generate_samples.py", line 300, in <module>
    main()
  File "evaluation/generate_samples.py", line 296, in main
    sample(local_rank, cfg, samples_split_dirs, is_ddp)
  File "evaluation/generate_samples.py", line 273, in sample
    save_specs(cfg, specs, samples_split_dirs, model, batch, split, sample_id, vocoder)
  File "evaluation/generate_samples.py", line 244, in save_specs
    save_path = samples_split_dirs[split] / class_foldername
KeyError: 'test'
Killing subprocess 257
Traceback (most recent call last):
  File "/root/miniconda3/envs/specvqgan/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/specvqgan/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/miniconda3/envs/specvqgan/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
    main()
  File "/root/miniconda3/envs/specvqgan/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/root/miniconda3/envs/specvqgan/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/specvqgan/bin/python', '-u', 'evaluation/generate_samples.py', 'sampler.config_sampler=evaluation/configs/sampler.yaml', 'sampler.model_logdir=./logs/2021-05-19T22-16-54_vggsound_codebook', 'sampler.splits="[test, ]"', 'sampler.samples_per_video=', 'sampler.batch_size=2', 'sampler.top_k=12', 'data.params.spec_dir_path=./data/vggsound/melspec_10s_22050hz/', 'sampler.now=$2023-04-11T21-08-11']' returned non-zero exit status 1.

The command I use is as follows：

python -m torch.distributed.launch \
    --nproc_per_node=1 \
    --nnodes=1 \
    --node_rank=0 \
    --master_addr=localhost \
    --master_port=62374 \
    --use_env \
        evaluation/generate_samples.py \
        sampler.config_sampler=evaluation/configs/sampler.yaml \
        sampler.model_logdir=$"./logs/2021-05-19T22-16-54_vggsound_codebook" \
        sampler.splits=$"\"[test, ]\"" \
        sampler.samples_per_video=$1 \
        sampler.batch_size=$32 \
        sampler.top_k=$512 \
        data.params.spec_dir_path=$"./data/vggsound/melspec_10s_22050hz/" \
        sampler.now=$`date +"%Y-%m-%dT%H-%M-%S"

I know someone has asked this question before, but I don't understand how to fix it, how should I fix it？

v-iashin commented 1 year ago

seems to be a duplicate of #21

The problem should be pretty easy to fix. The goal with "\"[test, ]\"" is to pass a list of splits e.g. ['test'] as an argument to omegaconf.

Try something like this (I haven't tried it):

Add cfg.sampler.splits=cfg.sampler.splits.split(',') there: https://github.com/v-iashin/SpecVQGAN/blob/8ab6981535ab70fad3531688e0f630f1ce3b834f/evaluation/generate_samples.py#L54
Replace sampler.splits=$"\"[test, ]\"" \ with sampler.splits="test" \

Let me know if it works and if you had to fix something else.

Ivvvvvvvvvvy commented 1 year ago

seems to be a duplicate of #21

The problem should be pretty easy to fix. The goal with "\"[test, ]\"" is to pass a list of splits e.g. ['test'] as an argument to omegaconf.

Try something like this (I haven't tried it):

Add cfg.sampler.splits=cfg.sampler.splits.split(',') there: https://github.com/v-iashin/SpecVQGAN/blob/8ab6981535ab70fad3531688e0f630f1ce3b834f/evaluation/generate_samples.py#L54

Replace sampler.splits=$"\"[test, ]\"" \ with sampler.splits="test" \

Let me know if it works and if you had to fix something else.

Thank you so much! ! ! The problem has been perfectly solved, and there are no problems during the evaluation phase too.

Then, I have a question about the "vggsound.csv" file. The data set is divided into three parts: train, test, and valid. Why is there only two types of train and test in the last column of "vggsound.csv"？May I ask whether the valid label has been changed to train or test?

v-iashin commented 1 year ago

I glad this has resolved the issue.

The original vggsound is split into two parts: train and test. We split the train into train and validation and keep the test split as it is for our experiments. For more details, see the paper

Ivvvvvvvvvvy commented 1 year ago

Thank you!!

v-iashin / SpecVQGAN

sample error: KeyError: 'test' #28