macabdul9 / CASA-Dialogue-Act-Classifier

PyTorch implementation of the paper "Dialogue Act Classification with Context-Aware Self-Attention" for dialogue act classification with a generic dataset class and PyTorch-Lightning trainer
MIT License
44 stars 13 forks source link

Does dataset class still have some problems? #6

Open nanzhao opened 3 years ago

nanzhao commented 3 years ago

python main.py wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. GPU available: False, used: False TPU available: False, using: 0 TPU cores wandb: (1) Create a W&B account wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results wandb: Enter your choice: 3 wandb: You chose 'Don't visualize my results' wandb: Offline run mode, not syncing to the cloud. wandb: W&B is disabled in this directory. Run wandb on to enable cloud syncing.

| Name | Type | Params

0 | model | ContextAwareDAC | 130 M Validation sanity check: 0it [00:00, ?it/s]wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. Traceback (most recent call last): File "main.py", line 52, in trainer.fit(model) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 444, in fit results = self.accelerator_backend.train() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 57, in train results = self.train_or_test() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test results = self.trainer.train() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 466, in train self.run_sanity_check(self.get_model()) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 658, in run_sanitycheck , eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 566, in run_evaluation for batch_idx, batch in enumerate(dataloader): File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/dataset/dataset.py", line 31, in getitem target = DADataset.__label_dict[label] KeyError: 'fo_ofw"_by_bc'

wandb: Waiting for W&B process to finish, PID 3182 wandb: Program failed with code 1. wandb: Find user logs for this run at: ./wandb/offline-run-20210127_191806-1pex8c61/logs/debug.log wandb: Find internal logs for this run at: ./wandb/offline-run-20210127_191806-1pex8c61/logs/debug-internal.log wandb: You can sync this run to the cloud by running: wandb: wandb sync ./wandb/offline-run-20210127_191806-1pex8c61

macabdul9 commented 3 years ago

Hi @nanzhao, I have just fixed an unrelated issue and It's running on my system. Seems like you don't have GPU in your machine but the Trainer is configured for GPU training please comment line 43 in main.py to train it on CPU.

nanzhao commented 3 years ago

should I wget https://www.dropbox.com/s/bpkt44sijmhfbxq/switchboard.zip or directly use the switchboard.zip in this repository? if I use the switchboard.zip in this repository, the issue seems the same. It seems it is caused by the data content errors? it seems some unseen typo error in data file.

if I wget switchboard.zip and unzip the file, it will happens like this: /Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/bin/python /Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/main.py wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. GPU available: False, used: False TPU available: False, using: 0 TPU cores

| Name | Type | Params

0 | model | ContextAwareDAC | 130 M Traceback (most recent call last): File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Text'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/main.py", line 54, in trainer.fit(model) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 444, in fit results = self.accelerator_backend.train() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 57, in train results = self.train_or_test() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test results = self.trainer.train() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 466, in train self.run_sanity_check(self.get_model()) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 648, in run_sanity_check self.reset_val_dataloader(ref_model) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 318, in reset_val_dataloader self.num_val_batches, self.val_dataloaders = self._reset_eval_dataloader(model, 'val') File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 238, in _reset_eval_dataloader dataloaders = self.request_dataloader(getattr(model, loader_name)) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 341, in request_dataloader dataloader = dataloader_fx() File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/Trainer.py", line 71, in val_dataloader valid_dataset = DADataset(tokenizer=self.tokenizer, data=valid_data, max_len=self.config['max_len'], text_field=self.config['text_field'], label_field=self.config['label_field']) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/dataset/dataset.py", line 10, in init self.text = list(data[text_field]) #data['train'][text_field] File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in getitem indexer = self.columns.get_loc(key) File "/Users/zhaonan8/github_project/CASA-Dialogue-Act-Classifier/my_proj_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc raise KeyError(key) from err KeyError: 'Text'

it seems the data format is not same with the code??

macabdul9 commented 3 years ago

Use this Kaggle Kernel to train without doing anything but make sure you have wandb API key. Also, you have to enable GPU on kaggle.

nanzhao commented 3 years ago

I see this in train.csv:

DamslActTag,Text "fo_ofw""_by_bc",Okay . x, . *slash error qh,Where to start. sd,"I haven't had that much, of course"

is "fo_ofw""_by_bc" a right act??? It seems it causes python dict error.

macabdul9 commented 3 years ago

Have you tried Kaggle Kernel?

glicerico commented 3 years ago

@nanzhao @Christopher-Thornton, "fo_o_fw_""_by_bc" is the tag offered by the pre-processing done by cgpotts repo, and it works fine for me.

macabdul9 commented 3 years ago

@Christopher-Thornton @nanzhao there are 780 utterances with this dialogue act in training data itself and there are 43 unique dialogue acts including "fo_o_fw_""_by_bc" and the paper also uses 43 classes. If you're still getting the error it's perfectly ok to rename this class.

glicerico commented 3 years ago

This tag actually corresponds to the one for "Other" in the Switchboard DAMSL manual. See section 1c. I am not sure why cgpotts labeled it as 2 separate strings, but as @macabdul9 says, you can rename it if you want (it works fine for me as it is, though).

noshad-vida commented 3 years ago

Thanks for sharing the code. I'm having a similar error:

KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/Users/jake/anaconda3/envs/nlp1/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/Users/jake/anaconda3/envs/nlp1/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/Users/jake/anaconda3/envs/nlp1/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/Users/jake/CASA-Dialogue-Act-Classifier/dataset/dataset.py", line 33, in getitem label = DADataset.__label_dict[act] KeyError: "fo_ofw""_by_bc"

I also tried renaming this label with another string, but still getting the KeyError.

macabdul9 commented 3 years ago

Hi @noshad-vida, can you try running this on Kaggle using this Notebook ?

noshad-vida commented 3 years ago

Thanks @macabdul9, this error comes up when trying to run this notebook on Kaggle (error at the first cell when installing the packages):

ERROR: Could not find a version that satisfies the requirement datasets ERROR: No matching distribution found for datasets

macabdul9 commented 3 years ago

@noshad-vida this should not happen, may be because you're using a different python environment on Kaggle ....try without installing datasets now the first cell should be !pip install wandb pytorch-lightning

noshad-vida commented 3 years ago

@macabdul9 I was able to run the Notebook on Kaggle, but still getting the same error on my local machine.

macabdul9 commented 3 years ago

Hi @noshad-vida ,

Hugginface datasets package has compatibility issues with pytorch and tokenizer/transforemrs due to which I use pandas to read csv datafiles. Try without installing datasets package because its no longer required.

PS: I apologize for late response

macksjeremy commented 3 years ago

I see this in train.csv:

DamslActTag,Text "fo_ofw""_by_bc",Okay . x, . *slash error qh,Where to start. sd,"I haven't had that much, of course"

is "fo_ofw""_by_bc" a right act??? It seems it causes python dict error.

Still having this issue.

adamelen commented 3 years ago

Hi @noshad-vida ,

Hugginface datasets package has compatibility issues with pytorch and tokenizer/transforemrs due to which I use pandas to read csv datafiles. Try without installing datasets package because its no longer required.

PS: I apologize for late response

It would be useful if you changed the README then, in order that everyone who tries to use the code doesn't encounter the same problem.

morleyd commented 3 years ago

I kept on getting the KeyError as well and I fixed it by modifying the init for the DADataset class to include the __label_dict as an attribute accessible with self:

    def __init__(self, tokenizer, data, text_field="clean_text", label_field="act_label_1", max_len=512):
        self.text = list(data[text_field])  # data['train'][text_field]
        self.acts = list(data[label_field])  # ['train'][label_field]
        self.tokenizer = tokenizer
        self.max_len = max_len

        # build/update the label dictionary 
        classes = sorted(set(self.acts))
        self.__label_dict = {cls: i for i, cls in enumerate(classes)}
adamelen commented 3 years ago

I'm getting the same error and I've realized that only "val_dataloader" is called during "trainer.fit(model)" (and not "train_dataloader", "test_dataloader"). Do you have any idea why this happens?

adamelen commented 3 years ago

I'm getting the same error and I've realized that only "val_dataloader" is called during "trainer.fit(model)" (and not "train_dataloader", "test_dataloader"). Do you have any idea why this happens?

Eventually, that's not the problem, it happened because of the num_sanity_val_steps parameter of the trainer and because of the error it didn't reach the point where it should load the train_dataloader. But I still can't find why the label_dict is empty when the getitem of DADataset is called, while it's filled with the acts and their indices when the DADataset is initialized...

maonanbe1 commented 1 year ago

D:\anaconda3\envs\pytorch\python.exe D:\PycharmProjects\CASA-Dialogue-Act-Classifier-main\main.py wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | ContextAwareDAC | 130 M Epoch 0: 0%| | 0/3021 [00:00<?, ?it/s] wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. wandb: WARNING W&B installed but not logged in. Run wandb login or set the WANDB_API_KEY env variable. Traceback (most recent call last): File "D:\PycharmProjects\CASA-Dialogue-Act-Classifier-main\main.py", line 66, in trainer.fit(model) File "D:\anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 444, in fit results = self.accelerator_backend.train() File "D:\anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\accelerators\gpu_accelerator.py", line 63, in train results = self.train_or_test() File "D:\anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 74, in train_or_test results = self.trainer.train() File "D:\anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 493, in train self.train_loop.run_training_epoch() File "D:\anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 554, in run_training_epoch for batch_idx, (batch, is_last_batch) in train_dataloader: File "D:\anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\profiler\profilers.py", line 80, in profile_iterable value = next(iterator) File "D:\anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py", line 46, in _with_is_last last = next(it) File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 530, in next data = self._next_data() File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1224, in _next_data return self._process_data(data) File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 1250, in _process_data data.reraise() File "D:\anaconda3\envs\pytorch\lib\site-packages\torch_utils.py", line 457, in reraise raise exception KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data_utils\worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\PycharmProjects\CASA-Dialogue-Act-Classifier-main\dataset\dataset.py", line 34, in getitem label = DADataset.__label_dict[act] KeyError: 'fo_ofw"_by_bc'

maonanbe1 commented 1 year ago

KeyError: Caught KeyError in DataLoader worker process 0. KeyError: 'fo_o_fw"_by_bc' It seems it is caused by the data content errors?