xk-huang / segment-caption-anything

[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gradio demo that show how to use the model.
https://xk-huang.github.io/segment-caption-anything/
Apache License 2.0
204 stars 7 forks source link

How to obtain the file image_b64.tsv for evaluation #14

Open liweiyangv opened 1 month ago

liweiyangv commented 1 month ago

Hi, Thank you for sharing the code. How can I use the model to evaluate the model on the Visual Genome dataset? One missing file image_b64.tsv for the scripts/tools/eval_suite.sh. How to obtain the file for IMAGE_B64_TSV_PATH. Thanks!

xk-huang commented 1 month ago

There is no need for image b64, if you do not want to compute the CLIP score.

For the information about image_b64.tsv, see:

liweiyangv commented 1 month ago

Hi Xiaoke

I want to confirm whether the final vdtk used in evaluate the results is from https://github.com/xk-huang/vdtk/tree/9cb1fc9bf82ea6fe4fe2146e33791be05585214c, i tried to install this package but it return the error vdtk is not installed when i use the bash for evaluate.

Best wishes.

-----原始邮件----- 发件人:"Xiaoke Huang" @.> 发送时间:2024-10-09 00:23:10 (星期三) 收件人: xk-huang/segment-caption-anything @.> 抄送: liweiyangxjtu @.>, Author @.> 主题: Re: [xk-huang/segment-caption-anything] How to obtain the file image_b64.tsv for evaluation (Issue #14)

There is no need for image b64, if you do not want to compute the CLIP score.

For the information about image_b64.tsv, see:

https://github.com/xk-huang/segment-caption-anything/blob/0d3f0b4a9caa8d5f8d23f5a301b9048161e930bc/amlt_configs/infer-sca-eval_suite-vg-last_model.yaml#L55 https://github.com/xk-huang/segment-caption-anything/blob/0d3f0b4a9caa8d5f8d23f5a301b9048161e930bc/scripts/tools/extract_region_img_annot_caption_to_tsv.py

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

liweiyangv commented 1 month ago

Hi Xiaoke,

I can pass the detection of the package vdtk, but i meet new error. The information is omegaconf.errors.ValidationError: Incompatible value 'None' for field of type 'int'. I use the vdtk but it needs transformers>=4.38.0 which is different from the file in requirement.txt. How can I deal with this problem. Thank you.

Best wishes

-----原始邮件----- 发件人:杨力玮 @.> 发送时间:2024-10-09 16:55:46 (星期三) 收件人: xk-huang/segment-caption-anything @.> 主题: Re: Re: [xk-huang/segment-caption-anything] How to obtain the file image_b64.tsv for evaluation (Issue #14)

Hi Xiaoke

I want to confirm whether the final vdtk used in evaluate the results is from https://github.com/xk-huang/vdtk/tree/9cb1fc9bf82ea6fe4fe2146e33791be05585214c, i tried to install this package but it return the error vdtk is not installed when i use the bash for evaluate.

Best wishes.

-----原始邮件----- 发件人:"Xiaoke Huang" @.> 发送时间:2024-10-09 00:23:10 (星期三) 收件人: xk-huang/segment-caption-anything @.> 抄送: liweiyangxjtu @.>, Author @.> 主题: Re: [xk-huang/segment-caption-anything] How to obtain the file image_b64.tsv for evaluation (Issue #14)

There is no need for image b64, if you do not want to compute the CLIP score.

For the information about image_b64.tsv, see:

https://github.com/xk-huang/segment-caption-anything/blob/0d3f0b4a9caa8d5f8d23f5a301b9048161e930bc/amlt_configs/infer-sca-eval_suite-vg-last_model.yaml#L55 https://github.com/xk-huang/segment-caption-anything/blob/0d3f0b4a9caa8d5f8d23f5a301b9048161e930bc/scripts/tools/extract_region_img_annot_caption_to_tsv.py

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

liweiyangv commented 1 month ago

Hi XiaoKe,

I have installed the vdtk successfully and i can obtain the correct results for noun and verb. But i still failed to evaluate the results with other metric, such as i obtain the nan for CIDEr-D, 0 for belu (i guess whether it is caused by the inference json in which only has one candidate caption). I still have a question for how to generate multi-caption, i replace the parameter model.num_caption_tokens=3, but it doesn't work. How can i deal with these problems. I am sorry to bother you.

Best wishes,

Liwei

-----原始邮件----- 发件人:"Xiaoke Huang" @.> 发送时间:2024-10-09 00:23:10 (星期三) 收件人: xk-huang/segment-caption-anything @.> 抄送: liweiyangxjtu @.>, Author @.> 主题: Re: [xk-huang/segment-caption-anything] How to obtain the file image_b64.tsv for evaluation (Issue #14)

There is no need for image b64, if you do not want to compute the CLIP score.

For the information about image_b64.tsv, see:

https://github.com/xk-huang/segment-caption-anything/blob/0d3f0b4a9caa8d5f8d23f5a301b9048161e930bc/amlt_configs/infer-sca-eval_suite-vg-last_model.yaml#L55 https://github.com/xk-huang/segment-caption-anything/blob/0d3f0b4a9caa8d5f8d23f5a301b9048161e930bc/scripts/tools/extract_region_img_annot_caption_to_tsv.py

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

liweiyangv commented 1 month ago

I use this command for inference :

python \ -m src.train \ train_data='[vg-densecap-local]' eval_data='[vg-densecap-local]' \ +model=base_sca \ training.do_train=False \ training.do_eval=False \ training.do_inference=True \ training.output_dir=amlt/train-sca-vg_densecap-081023/gpt2-large/ \ wandb.log=False \ model.model_name_or_path=/home/yangliwei/.cache/huggingface/hub/models--xk-huang--segment-caption-anything-gpt2_large-pt_vg/snapshots/91f940672ecebfb98eabf6d710106b8dd94d75ae/ \ model.num_caption_tokens =3

-----原始邮件----- 发件人:杨力玮 @.> 发送时间:2024-10-10 14:59:55 (星期四) 收件人: xk-huang/segment-caption-anything @.> 主题: Re: Re: [xk-huang/segment-caption-anything] How to obtain the file image_b64.tsv for evaluation (Issue #14)

Hi XiaoKe,

I have installed the vdtk successfully and i can obtain the correct results for noun and verb. But i still failed to evaluate the results with other metric, such as i obtain the nan for CIDEr-D, 0 for belu (i guess whether it is caused by the inference json in which only has one candidate caption). I still have a question for how to generate multi-caption, i replace the parameter model.num_caption_tokens=3, but it doesn't work. How can i deal with these problems. I am sorry to bother you.

Best wishes,

Liwei

-----原始邮件----- 发件人:"Xiaoke Huang" @.> 发送时间:2024-10-09 00:23:10 (星期三) 收件人: xk-huang/segment-caption-anything @.> 抄送: liweiyangxjtu @.>, Author @.> 主题: Re: [xk-huang/segment-caption-anything] How to obtain the file image_b64.tsv for evaluation (Issue #14)

There is no need for image b64, if you do not want to compute the CLIP score.

For the information about image_b64.tsv, see:

https://github.com/xk-huang/segment-caption-anything/blob/0d3f0b4a9caa8d5f8d23f5a301b9048161e930bc/amlt_configs/infer-sca-eval_suite-vg-last_model.yaml#L55 https://github.com/xk-huang/segment-caption-anything/blob/0d3f0b4a9caa8d5f8d23f5a301b9048161e930bc/scripts/tools/extract_region_img_annot_caption_to_tsv.py

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>