Missing config.split_option_flag?

hunterlang commented 2 years ago

Hi, thanks for the code!

When I run:

CUDA_VISIBLE_DEVICES=0 python -m src.pl_train -c t03b.json+rte.json -k save_model=False exp_name=first_exp3

I get:

Reusing dataset super_glue (/localdata/hjl/hf/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Train size 32
Eval size 277
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Missing logger folder: /home/hjl/t-few/exp_out/first_exp3/log

  | Name  | Type                       | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 2.8 B
-----------------------------------------------------
2.8 B     Trainable params
0         Non-trainable params
2.8 B     Total params
11,399.029Total estimated model params size (MB)
Validation sanity check:   0%|                                                                                                                                                                                                                             | 0/18 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/hjl/t-few/src/pl_train.py", line 86, in <module>
    main(config)
  File "/home/hjl/t-few/src/pl_train.py", line 57, in main
    trainer.fit(model, datamodule)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
    self._run_sanity_check(self.lightning_module)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
    self._evaluation_loop.run()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
    output = self._evaluation_step(batch, batch_idx, dataloader_idx)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
    output = self.trainer.accelerator.validation_step(step_kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 236, in validation_step
    return self.training_type_plugin.validation_step(*step_kwargs.values())
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/home/hjl/t-few/src/models/EncoderDecoder.py", line 229, in validation_step
    batch_output = self.predict(batch)
  File "/home/hjl/t-few/src/models/EncoderDecoder.py", line 139, in predict
    if not self.config.split_option_flag:
AttributeError: 'Config' object has no attribute 'split_option_flag'

I can't find a reference to split_option_flag in any of the config files. Should I manually set it?

Thanks!

dptam commented 2 years ago

Hi

Yea, we forgot to include the split_option_flag in the Config. Commit e115a18fba95436be44de164fa30720dfa441803 should fix that. Let me know if there are any more issues.

hunterlang commented 2 years ago

That works, thanks.

On the subject of small fixes, I had to remove the trailing slash from the origin_model in: https://github.com/r-three/t-few/blob/master/configs/t03b.json

to get .from_pretrained() to not complain.

HaokunLiu commented 2 years ago

Well, you caught us. When we run it on our cluster, we saved the T0(3B) model to a local path. I guess that a trace of it after we changed it back to the HF-downloadable model names.

Thanks for figuring it out and sharing it with us.

r-three / t-few

Missing config.split_option_flag? #2