Closed cactusycy closed 7 months ago
Using a single coco dataset can help solve this issue.
Using a single coco dataset can help solve this issue.
Thank you! I came across the same problem and I use a single coco datset did solve it. However, If I use a single vg dataset, the same problem came out. Any idea to solve this problem thoroughly?
I figured out how this bug happens.
When constructing the VG dataset, the __getitem__
method of ImageTextPairDataset
class in datasets/image_text_pair_datasets.py
do not handle the VG dataset images path properly. It simply read the path from the annotation file vg_caption.json
without doing some string operation resulting in the read image path and the actual image path will never match.
A thorough solution is adding a .split(''/)[-1]
operation to get a complete image_path = os.path.join(self.vis_root, ann["image"].split('/')[-1])
operation like coco datset does. And add a assert
in the following try-catch
part instead of simply returning a None
which causes the above TypeError
.
Using a single coco dataset can help solve this issue.
hi, how to use a single coco dataset,
use a single coco datset
Can I kindly ask how to use a single coco dataset? Thank you
Hey, guys. I just encountered a very annoying problem. After downloading the dataset and setting the path, I typed the "bash run_scripts/blip2/train/pretrain_stage1.sh"and hope it could work. However, the error comes. It may be dataset issues but I can not figure out why it happened after downloading the dataset and setting the path. Looking forward to your reply.
| distributed init (rank 0, world 1): env:// INFO - 2024-02-17 16:10:47,640 - config - ===== Running Parameters ===== INFO - 2024-02-17 16:10:47,641 - config - { "amp": true, "batch_size_eval": 64, "batch_size_train": 100, "device": "cuda", "dist_backend": "nccl", "dist_url": "env://", "distributed": true, "evaluate": false, "gpu": 0, "init_lr": 0.0001, "lr_sched": "linear_warmup_cosine_lr", "max_epoch": 10, "min_lr": 1e-05, "num_workers": 4, "output_dir": "output/BLIP2/Pretrain_stage1", "rank": 0, "resume_ckpt_path": null, "seed": 42, "task": "image_text_pretrain", "train_splits": [ "train" ], "warmup_lr": 1e-06, "warmup_steps": 5000, "weight_decay": 0.05, "world_size": 1 } INFO - 2024-02-17 16:10:47,641 - config - ====== Dataset Attributes ====== INFO - 2024-02-17 16:10:47,641 - config - ======== coco_caption ======= INFO - 2024-02-17 16:10:47,642 - config - { "build_info": { "annotations": { "test": { "md5": "3ff34b0ef2db02d01c37399f6a2a6cd1", "storage": "coco/annotations/coco_karpathy_test.json", "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_test.json" }, "train": { "md5": "aa31ac474cf6250ebb81d18348a07ed8", "storage": "coco/annotations/coco_karpathy_train.json", "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json" }, "val": { "md5": "b273847456ef5580e33713b1f7de52a0", "storage": "coco/annotations/coco_karpathy_val.json", "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json" } }, "images": { "storage": "coco/images/" } }, "data_type": "images", "dataset_card": "dataset_card/coco_caption.md", "text_processor": { "train": { "name": "blip_caption" } }, "vis_processor": { "train": { "image_size": 224, "name": "blip2_image_train" } } } INFO - 2024-02-17 16:10:47,642 - config - ======== vg_caption ======= INFO - 2024-02-17 16:10:47,642 - config - { "build_info": { "annotations": { "train": { "storage": "vg/annotations/vg_caption.json", "url": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/visual_genome/vg_caption.json" } }, "images": { "storage": "vg/images/" } }, "data_type": "images", "text_processor": { "train": { "name": "blip_caption" } }, "vis_processor": { "train": { "image_size": 224, "name": "blip_image_train" } } } INFO - 2024-02-17 16:10:47,642 - config - ====== Model Attributes ====== INFO - 2024-02-17 16:10:47,642 - config - { "arch": "blip2", "drop_path_rate": 0, "finetuned": "", "freeze_vit": true, "image_size": 224, "load_finetuned": false, "load_pretrained": false, "model_type": "pretrain", "num_query_token": 32, "pretrained": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained.pth", "use_grad_checkpoint": false, "vit_precision": "fp16" } Using downloaded and verified file: /root/autodl-tmp/lavis/coco/annotations/coco_karpathy_train.json Using downloaded and verified file: /root/autodl-tmp/lavis/coco/annotations/coco_karpathy_val.json Using downloaded and verified file: /root/autodl-tmp/lavis/coco/annotations/coco_karpathy_test.json INFO - 2024-02-17 16:10:47,643 - base_dataset_builder - Building datasets... Using downloaded and verified file: /root/autodl-tmp/lavis/vg/annotations/vg_caption.json INFO - 2024-02-17 16:10:48,682 - base_dataset_builder - Building datasets... INFO - 2024-02-17 16:11:18,378 - blip2_qformer - freeze vision encoder INFO - 2024-02-17 16:11:41,841 - runner_base - Start training INFO - 2024-02-17 16:11:42,834 - runner_base - dataset_ratios not specified, datasets will be concatenated (map-style datasets) or chained (webdataset.DataPipeline). INFO - 2024-02-17 16:11:42,834 - runner_base - Loaded 1388521 records for train split from the dataset. INFO - 2024-02-17 16:11:42,834 - runner_base - Loaded 5000 records for val split from the dataset. INFO - 2024-02-17 16:11:42,834 - runner_base - Loaded 5000 records for test split from the dataset. INFO - 2024-02-17 16:11:42,839 - runner_base - number of trainable parameters: 186705470 INFO - 2024-02-17 16:11:42,841 - base_task - Start training epoch 0, 13885 iters per inner epoch. Traceback (most recent call last): File "train.py", line 103, in
main()
File "train.py", line 99, in main
runner.train()
File "/root/LAVIS/lavis/runners/runner_base.py", line 384, in train
train_stats = self.train_epoch(cur_epoch)
File "/root/LAVIS/lavis/runners/runner_base.py", line 451, in train_epoch
return self.task.train_epoch(
File "/root/LAVIS/lavis/tasks/base_task.py", line 116, in train_epoch
return self._train_inner_loop(
File "/root/LAVIS/lavis/tasks/base_task.py", line 207, in _train_inner_loop
samples = next(data_loader)
File "/root/LAVIS/lavis/datasets/datasets/dataloader_utils.py", line 149, in next
data = next(self.iter_loader)
File "/root/LAVIS/lavis/datasets/datasets/dataloader_utils.py", line 59, in iter
self.preload(loader_it)
File "/root/LAVIS/lavis/datasets/datasets/dataloader_utils.py", line 77, in preload
self.batch = next(it)
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/_utils.py", line 722, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/root/LAVIS/lavis/datasets/datasets/base_dataset.py", line 85, in collater
all_keys.update(s)
TypeError: 'NoneType' object is not iterable
[2024-02-17 16:11:50,735] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1198) of binary: /root/miniconda3/envs/lavis/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/lavis/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/miniconda3/envs/lavis/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/run.py", line 816, in
main()
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train.py FAILED
Failures: