microsoft / VISOR

MIT License
42 stars 5 forks source link

Reproduce SD (& other models') results #1

Closed zwcolin closed 10 months ago

zwcolin commented 1 year ago

Hello,

I was wondering if there's any way to reproduce your results from your main table? I didn't find any information about seed/generator usage from your codebase/report so I'd really appreciate if you could provide some insights on reproducing these numbers. Thanks!

Also, a side question, for all SD that is mentioned from the codebase and the report, are you using stable diffusion 1.4 or 1.5?

tejas-gokhale commented 1 year ago

Hi,

we directly used the open-source versions of the models that we have tested in our benchmarking. We used SD 1.4 and the default parameters from https://github.com/CompVis/stable-diffusion/blob/main/scripts/txt2img.py

Note that we have provided the outputs of our object detector in ./objdet_results . All of our generated images are available here: https://huggingface.co/datasets/tgokhale/sr2d_visor/tree/main

Happy to provide more information if needed.

zwcolin commented 1 year ago

Thanks for the response and the additional information!

Another question that we had is how accurate is the detector for ground truth images? i.e., by setting the threshold to 0.1, do you provide a VISOR reference to ground truth image-prompt (or image-caption) pairs, such as all captions relating to spatial relationships from the COCO dataset?