yangland / hatefulchallenge

Enhance Multimodal Model Performance with Data Augmentation: Facebook Hateful Meme Challenge Solution
4 stars 0 forks source link

Facing issues with MMF config for running the VilBERT pretrained model on Hateful Meme dataset. #1

Open AnjumJ123 opened 2 years ago

AnjumJ123 commented 2 years ago

I am trying to reproduce the code for running VilBERT on hateful meme dataset, but the exiting code needs to be modified to point to the new data source for hateful meme challenge data and then be linked in the code.

Facebook released a webpage (https://hatefulmemeschallenge.com/) where the dataset can be downloaded. Would you be able to change the notebook slightly to reproduce the code? I am facing challenges in mmf_convert_hm to get the hateful meme data to be converted into the MMF format and the image, .jsonl files to be moved to the corresponding folders as the repo expects. Facing challenge in terms of ensuring the MMF pre-requisites after downloading the datasets and the changes that need to be made if any in the .yaml and config files to make the code to work.

AnjumJ123 commented 2 years ago

Not sure why it is looking for dev.jsonl when that is not one of the .jsonl files in the hateful_memes dataset. Any suggestions to address this error?

'!mmf_run config="projects/visual_bert/configs/hateful_memes/from_coco.yaml" \ model=visual_bert \ dataset=hateful_memes \ run_type=train_val \ training.log_interval=200 \ training.max_updates=22000 \ training.batch_size=64 \ training.evaluation_interval=200 \ training.tensorboard=True \ training.checkpoint_interval=200 \ checkpoint.resume_pretrained=True \ checkpoint.resume_zoo=visual_bert.pretrained.coco \ dataset_config.hateful_memes.annotations.train[0]="hateful_memes/defaults/annotations/train.jsonl" \ dataset_config.hateful_memes.annotations.val[0]="hateful_memes/defaults/annotations/dev_unseen.jsonl" \ dataset_config.hateful_memes.annotations.test[0]="hateful_memes/defaults/annotations/test_unseen.jsonl"'

Error:

You can disable this warning by setting the environment variable OC_DISABLE_DOT_ACCESS_WARNING=1 warnings.warn(message=msg, category=UserWarning) Overriding option config to projects/visual_bert/configs/hateful_memes/from_coco.yaml Overriding option model to visual_bert Overriding option datasets to hateful_memes Overriding option run_type to train_val Overriding option training.log_interval to 200 Overriding option training.max_updates to 22000 Overriding option training.batch_size to 64 Overriding option training.evaluation_interval to 200 Overriding option training.tensorboard to True Overriding option training.checkpoint_interval to 200 Overriding option checkpoint.resume_pretrained to True Overriding option checkpoint.resume_zoo to visual_bert.pretrained.coco Using seed 30549397 Logging to: ./save/logs/train_2022-04-20T16:33:30.log Downloading features.tar.gz: 100% 8.44G/8.44G [06:24<00:00, 22.0MB/s] Traceback (most recent call last): File "/usr/local/bin/mmf_run", line 8, in sys.exit(run()) File "/usr/local/lib/python3.7/dist-packages/mmf_cli/run.py", line 111, in run main(configuration, predict=predict) File "/usr/local/lib/python3.7/dist-packages/mmf_cli/run.py", line 40, in main trainer.load() File "/usr/local/lib/python3.7/dist-packages/mmf/trainers/base_trainer.py", line 59, in load self.load_datasets() File "/usr/local/lib/python3.7/dist-packages/mmf/trainers/base_trainer.py", line 83, in load_datasets self.dataset_loader.load_datasets() File "/usr/local/lib/python3.7/dist-packages/mmf/common/dataset_loader.py", line 18, in load_datasets self.val_dataset.load(self.config) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/multi_dataset_loader.py", line 114, in load self.build_datasets(config) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/multi_dataset_loader.py", line 131, in build_datasets dataset_instance = build_dataset(dataset, dataset_config, self.dataset_type) File "/usr/local/lib/python3.7/dist-packages/mmf/utils/build.py", line 106, in build_dataset dataset = builder_instance.load_dataset(config, dataset_type) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/base_dataset_builder.py", line 96, in load_dataset dataset = self.load(config, dataset_type, *args, kwargs) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/builders/hateful_memes/builder.py", line 39, in load self.dataset = super().load(config, dataset_type, *args, *kwargs) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/mmf_dataset_builder.py", line 141, in load dataset = dataset_class(config, dataset_type, imdb_idx) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/builders/hateful_memes/dataset.py", line 19, in init super().init(dataset_name, config, args, kwargs) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/mmf_dataset.py", line 25, in init self.annotation_db = self._build_annotation_db() File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/mmf_dataset.py", line 39, in _build_annotation_db return AnnotationDatabase(self.config, annotation_path) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/databases/annotation_database.py", line 24, in init self._load_annotation_db(path) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/databases/annotation_database.py", line 32, in _load_annotation_db self._load_jsonl(path) File "/usr/local/lib/python3.7/dist-packages/mmf/datasets/databases/annotation_database.py", line 39, in _load_jsonl with PathManager.open(path, "r") as f: File "/usr/local/lib/python3.7/dist-packages/mmf/utils/file_io.py", line 45, in open newline=newline, FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/annotations/dev.jsonl'