Can't reproduce evaluation results of imagenet1k

lziiid commented 1 year ago

Hi there, I can roughly reproduce the result of COCO, Flickr30, OKVQA, and VQAv2, but not the results on imagenet1k. the acc@1 is only 0.02, and the acc@5 is only 0.04. Is this a personal problem for me, or is there someone else who can't reproduce it either? Thanks a lot.

Haochen-Luo commented 1 year ago

@lziiid Hi lziiid, I am also trying to reproduce the results. Would you mind sharing your results on COCO, Flickr30, OKVQA, and VQAv2?

anas-awadalla commented 1 year ago

Hmm this doesn’t seem right yes. Can you share the command you are running?

lziiid commented 1 year ago

Hmm this doesn’t seem right yes. Can you share the command you are running?

Here is the command I'm running, with all datasets the same ( except --eval{datasets} ):

python open_flamingo/eval/evaluate.py \ --lm_path $LM_PATH \ --lm_tokenizer_path $LM_TOKENIZER_PATH \ --vision_encoder_path $VISION_ENCODER_NAME \ --vision_encoder_pretrained $VISION_ENCODER_PRETRAINED \ --checkpoint_path $CKPT_PATH \ --cross_attn_every_n_layers 4 \ --device $DEVICE \ --coco_train_image_dir_path $COCO_TRAIN_IMG_PATH \ --coco_val_image_dir_path $COCO_TEST_IMG_PATH \ --coco_karpathy_json_path $COCO_KARPATHY_JSON_PATH \ --coco_train_embedding_path $COCO_TRAIN_EMBEDDING_PATH \ --vqav2_train_image_dir_path $VQAV2_TRAIN_IMG_PATH \ --vqav2_train_annotations_json_path $VQAV2_TRAIN_ANNO_PATH \ --vqav2_train_questions_json_path $VQAV2_TRAIN_QUESTION_PATH \ --vqav2_test_image_dir_path $VQAV2_TEST_IMG_PATH \ --vqav2_test_annotations_json_path $VQAV2_TEST_ANNO_PATH \ --vqav2_test_questions_json_path $VQAV2_TEST_QUESTION_PATH \ --ok_vqa_train_image_dir_path $OKVQA_TRAIN_IMG_PATH \ --ok_vqa_train_annotations_json_path $OKVQA_TRAIN_ANNO_PATH \ --ok_vqa_train_questions_json_path $OKVQA_TRAIN_QUESTION_PATH \ --ok_vqa_test_image_dir_path $OKVQA_TEST_IMG_PATH \ --ok_vqa_test_annotations_json_path $OKVQA_TEST_ANNO_PATH \ --ok_vqa_test_questions_json_path $OKVQA_TEST_QUESTION_PATH \ --flickr_image_dir_path $FLICKR30K_IMG_PATH \ --flickr_annotations_json_path $FLICKR30K_ANNO_PATH \ --imagenet_root $IMAGENET1K_PATH \ --results_file $RESULTS_FILE \ --eval_imagenet1k \ --num_samples 5000 \ --shots 4 \ --num_trials 1 \ --batch_size 8 \

lziiid commented 1 year ago

@lziiid Hi lziiid, I am also trying to reproduce the results. Would you mind sharing your results on COCO, Flickr30, OKVQA, and VQAv2?

Sure

VQAv2:

1 | 2 | 4 | 8 | 16 -- | -- | -- | -- | -- 29.82 | 35.98 | 42.12 | 46.52 | 47.99

OKVQA:

1 | 2 | 4 | 8 | 16 -- | -- | -- | -- | -- 23.72 | 29.6 | 33.88 | 36.12 | 38.76

COCO:

1 | 2 | 4 | 8 | 16 -- | -- | -- | -- | -- 52.65 | 67.04 | 72.5 | 75.9 | 79.7

Flickr30:

1 | 2 | 4 | 8 | 16 -- | -- | -- | -- | -- 37.29 | 46.59 | 48.55 | 51.89 | 53.27

dribnet commented 1 year ago

also not getting my imagenet eval to work.

Flamingo model initialized with 1046992944 trainable parameters
Evaluating on ImageNet...
Running inference imagenet: 0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 1302, in <module>
    main()
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 654, in main
    imagenet_score = evaluate_classification(
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 1234, in evaluate_classification
    + prompt_fn({"ocr": batch["ocr"][i], "class_name": None})
KeyError: 'ocr'

Mine appears to be a different issue. Not 100% sure if the files are laid out on disk as expected (works for other pytorch cases). Will see if I can bottom this out.

anas-awadalla commented 1 year ago

Ah I think this is a typo. This prompt fn call is currently written for hateful memes.

dribnet commented 1 year ago

Thanks for that - I've confirmed that this appears to be a typo introduced in bcf220c1 - if I revert that line as it was before to simply

batch_text.append(context_text)

then it proceeds fine... on cpu. However - I''m still struggling to get the code to honor the --device correctly and evaluate on the GPU.

You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
Flamingo model initialized with 1046992944 trainable parameters
Traceback (most recent call last):
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 1302, in <module>
    main()
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 400, in main
    eval_model = module.EvalModel(model_args)
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/models/open_flamingo.py", line 53, in __init__
    checkpoint = torch.load(model_args["checkpoint_path"], map_location=self.device)
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/serialization.py", line 1083, in restore_location
    return default_restore_location(storage, map_location)
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/serialization.py", line 220, in default_restore_location
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.storage.UntypedStorage (tagged with 0)

I'll continue to poke around on this. My goal is go confirm imagenet evaluation and then to move onto getting/testing other datasets before attempting to fine tune.

anas-awadalla commented 1 year ago

@dribnet Are you using this script as is?

dribnet commented 1 year ago

Well not as-is since that one evaluates everything except ImageNet. 😅 It is based on that one - but I've also removed the slurm dependency since I hoped that would be easier than figuring out what slrum is and how to configure it on a single node machine.

dribnet commented 1 year ago

Managed to get a bit further on this today by using device="cuda:0" instead of device=0 (and changing the internal param check to allow string based devices). So now loading the model on the gpu, but then first forward pass fails.

Flamingo model initialized with 1046992944 trainable parameters
Evaluating on ImageNet...
Running inference imagenet: 0it [00:03, ?it/s]
Traceback (most recent call last):
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 1303, in <module>
    main()
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 654, in main
    imagenet_score = evaluate_classification(
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 1240, in evaluate_classification
    eval_model.get_rank_classifications(
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/models/open_flamingo.py", line 179, in get_rank_classifications
    precomputed = self.__call__(
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/models/open_flamingo.py", line 279, in __call__
    outputs = self.model(
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/md1/nets/open_flamingo/open_flamingo/src/flamingo.py", line 111, in forward
    output = self.lang_encoder(
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/md1/nets/open_flamingo/open_flamingo/src/flamingo_lm.py", line 157, in forward
    return super().forward(**kwargs)  # Call the other parent's forward method
TypeError: forward() got an unexpected keyword argument 'labels'

This is promising - seems unhappy with labels being passed in there - perhaps another typo introduced in bcf220c1 I can check on that next time. I could also try other ways of evaluating the model if there is another dataset that is reasonable to get (I've also been downloading mmc4 for the past week via the scripts for this).

update the above error was for OpenFlamingo-3B-vitl-mpt1b-langinstruct - but just checked and OpenFlamingo-3B-vitl-mpt1b seems to work (4 minutes per iteration). will report back how it goes / what metrics I get.

anas-awadalla commented 1 year ago

Are you using this LM which has been patched to include a labels argument?

JoyHuYY1412 commented 11 months ago

Managed to get a bit further on this today by using device="cuda:0" instead of device=0 (and changing the internal param check to allow string based devices). So now loading the model on the gpu, but then first forward pass fails.

Flamingo model initialized with 1046992944 trainable parameters
Evaluating on ImageNet...
Running inference imagenet: 0it [00:03, ?it/s]
Traceback (most recent call last):
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 1303, in <module>
    main()
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 654, in main
    imagenet_score = evaluate_classification(
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/evaluate.py", line 1240, in evaluate_classification
    eval_model.get_rank_classifications(
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/models/open_flamingo.py", line 179, in get_rank_classifications
    precomputed = self.__call__(
  File "/mnt/md1/nets/open_flamingo/open_flamingo/eval/models/open_flamingo.py", line 279, in __call__
    outputs = self.model(
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/md1/nets/open_flamingo/open_flamingo/src/flamingo.py", line 111, in forward
    output = self.lang_encoder(
  File "/usr/local/anaconda3/envs/openflamingo2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/md1/nets/open_flamingo/open_flamingo/src/flamingo_lm.py", line 157, in forward
    return super().forward(**kwargs)  # Call the other parent's forward method
TypeError: forward() got an unexpected keyword argument 'labels'

This is promising - seems unhappy with labels being passed in there - perhaps another typo introduced in bcf220c I can check on that next time. I could also try other ways of evaluating the model if there is another dataset that is reasonable to get (I've also been downloading mmc4 for the past week via the scripts for this).

update the above error was for OpenFlamingo-3B-vitl-mpt1b-langinstruct - but just checked and OpenFlamingo-3B-vitl-mpt1b seems to work (4 minutes per iteration). will report back how it goes / what metrics I get.

Hi, do you implement this to get the zero-shot classification accuracy on ImageNet? I am not very familiar with this and I appreciate your reply.

tomato996 commented 10 months ago

Well, I'm sure that I used the LM in this,but I also failed in the evaluation of heatfulmemes dataset with the same error. The only difference is that I used the OpenFlamingo-3B-vitl-mpt1b and it also did not work due to the same error.😥

mlfoundations / open_flamingo

Can't reproduce evaluation results of imagenet1k #200