state-spaces / s4

Structured state space sequence models
Apache License 2.0
2.47k stars 296 forks source link

How to download wt103 dataset properly? #110

Closed Chord-Chen-30 closed 1 year ago

Chord-Chen-30 commented 1 year ago

I run the command in state-spaces/:

HYDRA_FULL_ERROR=1 DATA_PATH=/.../state-spaces/data python train.py wandb=null +dataset.data_dir=/.../state-spaces/data experiment=lm/transformer-wt103

The program gives:

Error executing job with overrides: ['wandb=null', '+dataset.data_dir=/.../state-spaces/data', 'experiment=lm/transformer-wt103']
Traceback (most recent call last):
  File "/.../state-spaces/train.py", line 717, in <module>
    main()
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/.../anaconda3/envs/effseq/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/.../state-spaces/train.py", line 713, in main
    train(config)
  File "/.../state-spaces/train.py", line 685, in train
    model = SequenceLightningModule(config)
  File "/.../state-spaces/train.py", line 136, in __init__
    self.dataset = SequenceDataset.registry[self.hparams.dataset._name_](
  File "/.../state-spaces/train.py", line 150, in setup
    self.dataset.setup()
  File "/.../state-spaces/src/dataloaders/lm.py", line 346, in setup
    self._vocab_count()
  File "/.../state-spaces/src/dataloaders/lm.py", line 432, in _vocab_count
    self.vocab.count_file(self.data_dir / "train.txt")
  File "/.../state-spaces/src/dataloaders/utils/vocabulary.py", line 59, in count_file
    assert os.path.exists(path)
AssertionError

Am I setting the config incorrectly? Or should I download train/valid/test.txt in advance and put them in state-spaces/data/ ?

DavidHerel commented 1 year ago

Hi,

how did you fixed the problem?

Chord-Chen-30 commented 1 year ago

Hi,

how did you fixed the problem?

I changed my codebase to safari