Open Hayeon-kimm opened 4 months ago
Yes, I think you can obtain the clip embeddings in such way. In InteractDiffusion, only text_embedding is used.
If you okay, can you share your process_grounding.py code for HICO-DET? To get embedding for custom data, I make the loader for this. But, your hico-det-clip and gligen.tsv is little different. So, I want to show your preprocess step for HICO-DET.
You may refer to extract_embedding.py.zip.
Thank you for sharing your research in this code. Thanks to you, I am studying a lot. I want to conduct an experiment on custom dataset. If you look at the HICO_DET_CLIP you share, I think I need 'action' image_embedding / text_embedding as well. To use it on custom dataset, can I get the action bbox, cut it, and pass it through clip embedding correspond to the action image embedding you provided?