Open rohitpaul23 opened 1 year ago
Hi @rohitpaul23, do you generate the jsonl files following the setup?
are you referring to this: tokenizer = XLMRobertaTokenizer("beit3_model_path/beit3.spm")
CaptioningDataset.make_coco_captioning_dataset_index( data_path="/home/ec2-user/training/caption/testCaption", tokenizer=tokenizer, )
i tried it with data_path contains the my custom dataset, but it ask for dataset_coco.json file, did i have to create my own json file based on my custom data
I want to use Beit3 using weight beit3_large_patch16_480_coco_captioning for image captioning on my custom images. I have download the weights and .spm file and using the following command: !python -m torch.distributed.launch --nproc_per_node=1 run_beit3_finetuning.py \ --model beit3_large_patch16_480 \ --input_size 480 \ --task coco_captioning \ --batch_size 4 \ --sentencepiece_model /home/ec2-user/training/caption/unilm/beit3/beit3_model_path/beit3.spm \ --finetune /home/ec2-user/training/caption/unilm/beit3/beit3_model_path/beit3_large_patch16_480_coco_captioning.pth \ --data_path /home/ec2-user/training/caption/testCaption \ --output_dir /home/ec2-user/training/caption/beit3Result \ --eval \ --dist_eval, where my images are stored in testCaption folder.
By doing so, I am getting the following error: Traceback (most recent call last): File "/home/ec2-user/training/caption/unilm/beit3/run_beit3_finetuning.py", line 448, in
main(opts, ds_init)
File "/home/ec2-user/training/caption/unilm/beit3/run_beit3_finetuning.py", line 244, in main
data_loader_train, data_loader_val = create_downstream_dataset(args)
File "/home/ec2-user/training/caption/unilm/beit3/datasets.py", line 846, in create_downstream_dataset
create_dataset_by_split(args, split="train", is_train=True), \
File "/home/ec2-user/training/caption/unilm/beit3/datasets.py", line 822, in create_dataset_by_split
dataset = dataset_class(
File "/home/ec2-user/training/caption/unilm/beit3/datasets.py", line 626, in init
super().init(
File "/home/ec2-user/training/caption/unilm/beit3/datasets.py", line 40, in init
with open(index_file, mode="r", encoding="utf-8") as reader:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ec2-user/training/caption/testCaption/coco_captioning.train.jsonl'
what is the file that is not present and whether I am going correctly
Please help Thank you