salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

BSD 3-Clause "New" or "Revised" License

9.65k stars 941 forks source link

generate() got an unexpected keyword argument 'length_penalty' #625

Open liuhaoyutz opened 8 months ago

liuhaoyutz commented 8 months ago

Ubuntu-22.04 single Nvidia 3090

Refer https://opensource.salesforce.com/LAVIS/latest/tutorial.evaluation.html Change --nproc_per_node from 8 to 1 as below: diff --git a/run_scripts/blip/eval/eval_coco_cap.sh b/run_scripts/blip/eval/eval_coco_cap.sh index d58ff58..8337083 100644 --- a/run_scripts/blip/eval/eval_coco_cap.sh +++ b/run_scripts/blip/eval/eval_coco_cap.sh @@ -1 +1 @@ -python -m torch.distributed.run --nproc_per_node=8 evaluate.py --cfg-path lavis/projects/blip/eval/caption_coco_eval.yaml +python -m torch.distributed.run --nproc_per_node=1 evaluate.py --cfg-path lavis/projects/blip/eval/caption_coco_eval.yaml

generate() got an unexpected keyword argument 'length_penalty' issue occur when execute run_scripts/blip/eval/eval_coco_cap.sh, detail as below:

(lavis) haoyu@haoyu:~/work/code/LAVIS$ bash run_scripts/blip/eval/eval_coco_cap.sh | distributed init (rank 0, world 1): env:// INFO - 2023-12-30 07:37:08,112 - config - ===== Running Parameters ===== INFO - 2023-12-30 07:37:08,113 - config - { "batch_size_eval": 64, "batch_size_train": 32, "device": "cuda", "dist_backend": "nccl", "dist_url": "env://", "distributed": true, "evaluate": true, "gpu": 0, "max_len": 20, "min_len": 5, "num_beams": 3, "num_workers": 4, "output_dir": "output/BLIP/Caption_coco", "rank": 0, "seed": 42, "task": "captioning", "test_splits": [ "test" ], "world_size": 1 } INFO - 2023-12-30 07:37:08,113 - config - ====== Dataset Attributes ====== INFO - 2023-12-30 07:37:08,113 - config - ======== coco_caption ======= INFO - 2023-12-30 07:37:08,113 - config - { "build_info": { "annotations": { "test": { "md5": "3ff34b0ef2db02d01c37399f6a2a6cd1", "storage": "coco/annotations/coco_karpathy_test.json", "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_test.json" }, "train": { "md5": "aa31ac474cf6250ebb81d18348a07ed8", "storage": "coco/annotations/coco_karpathy_train.json", "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json" }, "val": { "md5": "b273847456ef5580e33713b1f7de52a0", "storage": "coco/annotations/coco_karpathy_val.json", "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json" } }, "images": { "storage": "coco/images/" } }, "data_type": "images", "dataset_card": "dataset_card/coco_caption.md", "text_processor": { "eval": { "name": "blip_caption" } }, "vis_processor": { "eval": { "name": "blip_image_eval" } } } INFO - 2023-12-30 07:37:08,114 - config - ====== Model Attributes ====== INFO - 2023-12-30 07:37:08,114 - config - { "arch": "blip_caption", "finetuned": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP/blip_coco_caption_base.pth", "image_size": 384, "load_finetuned": true, "med_config_path": "configs/models/med_config.json", "model_type": "base_coco", "pretrained": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth", "prompt": "a picture of ", "vit_ckpt_layer": 0, "vit_grad_ckpt": false, "vit_type": "base" } Using downloaded and verified file: /home/haoyu/.cache/lavis/coco/annotations/coco_karpathy_train.json Using downloaded and verified file: /home/haoyu/.cache/lavis/coco/annotations/coco_karpathy_val.json Using downloaded and verified file: /home/haoyu/.cache/lavis/coco/annotations/coco_karpathy_test.json INFO - 2023-12-30 07:37:08,115 - base_dataset_builder - Building datasets... INFO - 2023-12-30 07:37:15,385 - base_model - Missing keys [] INFO - 2023-12-30 07:37:15,385 - base_model - load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP/blip_coco_caption_base.pth INFO - 2023-12-30 07:37:15,401 - runner_base - dataset_ratios not specified, datasets will be concatenated (map-style datasets) or chained (webdataset.DataPipeline). INFO - 2023-12-30 07:37:15,401 - runner_base - Loaded 566747 records for train split from the dataset. INFO - 2023-12-30 07:37:15,401 - runner_base - Loaded 5000 records for val split from the dataset. INFO - 2023-12-30 07:37:15,401 - runner_base - Loaded 5000 records for test split from the dataset. INFO - 2023-12-30 07:37:15,402 - runner_base - Empty train splits. INFO - 2023-12-30 07:37:15,402 - runner_base - Empty train splits. INFO - 2023-12-30 07:37:15,402 - runner_base - Empty train splits. Traceback (most recent call last): File "evaluate.py", line 92, in main() File "evaluate.py", line 88, in main runner.evaluate(skip_reload=True) File "/home/haoyu/work/code/LAVIS/lavis/runners/runner_base.py", line 441, in evaluate test_logs[split_name] = self.eval_epoch( File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/haoyu/work/code/LAVIS/lavis/runners/runner_base.py", line 489, in eval_epoch results = self.task.evaluation(model, data_loader) File "/home/haoyu/work/code/LAVIS/lavis/tasks/base_task.py", line 96, in evaluation eval_output = self.valid_step(model=model, samples=samples) File "/home/haoyu/work/code/LAVIS/lavis/tasks/captioning.py", line 104, in valid_step captions = model.generate( TypeError: generate() got an unexpected keyword argument 'length_penalty' [2023-12-30 07:37:20,643] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 8565) of binary: /home/haoyu/anaconda3/envs/lavis/bin/python Traceback (most recent call last): File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/run.py", line 810, in main() File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(args, kwargs) File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main run(args) File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call** return launch_agent(self._config, self._entrypoint, list(args)) File "/home/haoyu/anaconda3/envs/lavis/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

evaluate.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-12-30_07:37:20 host : haoyu rank : 0 (local_rank: 0) exitcode : 1 (pid: 8565) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================

Davidwhw commented 4 months ago

I had the same problem. How did you solve it? Thank you for any help you can provide.

Luxanna-Real commented 2 months ago

I am experiencing the same issue. It seems that the generate() function is wrong. Has anyone found a solution to this problem? Any advice or suggestions would be greatly appreciated. Thank you!

Davidwhw commented 2 months ago

I am experiencing the same issue. It seems that the generate() function is wrong. Has anyone found a solution to this problem? Any advice or suggestions would be greatly appreciated. Thank you!

This should be a legacy bug, blip-1 does not require this parameter, but blip-2 does. If you only need blip-1, you can comment the corresponding code directly in the code. For example:

    def valid_step(self, model, samples):
        results = []
        # run_cfg = slf.cfg.run_cfg
        captions = model.generate(
            samples,
            use_nucleus_sampling=False,
            num_beams=self.num_beams,
            max_length=self.max_len,
            min_length=self.min_len,
            repetition_penalty=self.repetition_penalty,
            # length_penalty=self.length_penalty,
            top_p=self.top_p,
            # temperature=self.temperature,
        )