Open mehdidc opened 1 year ago
Good points. The default captions used validate_and_save_model
are from https://github.com/j-min/DallEval, with the intent of eventually adding automatic validation to this repo. There are some options for merging into this repo:
validate_and_save
function. I am partially against this because CLIP score and FID score both involve loading other models/datasets, which would complicate the config and GPU memory consumption as well as the main train script.validate_and_save
makes, then computes FID/CLIP score when there's an update. This would operate on a separate node/set of GPUs than the original train script and would be invoked by the user separately from the train script.Which do you think is a better option? Also thanks for the pointer to ImageReward, I will look into it!
I would also go for option 2 for now at least, because of what you said + the need to distribute the computation of all the metrics over the GPUs as well otherwise only rank zero would be used, while others GPUs would wait.
Sounds good to me, it will take me some time to implement this. Let me know if you'd like to take some part of the PR. I see 3 direct parts:
wandb
as the training run.I have partial implementations on all of these (except ImageReward) which I will push to a working branch soon that we could use as a starting point.
I can take care of ImageReward, and help with others, so please go ahead and push the working branch so that I extend it. maybe you can do FID and I do CLIPScore, or the other way.
Another work to consider: https://arxiv.org/abs/2305.01569, similar to ImageReward (they also compare themselves with ImageReward). Code: https://github.com/yuvalkirstain/PickScore
Hi Mehdi, I have added some starting code in the evaluation
branch. It's rough but has an implementation of computing clip score directly from a tar file w/o extracting as well as how it would be used by someone in evaluation/quality_metrics_watcher.py
. It also has a starting point for FID score. Thanks for the pointer to that paper :-) Heads up, I will be a bit slow to reply to things due to the NeurIPS deadline.
Would be great to have (optional) model evaluation. Possibilities: