salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.73k stars 955 forks source link

Discrn dataset of x-instructblip #627

Open ADu2021 opened 9 months ago

ADu2021 commented 9 months ago

Thank you very much for your work! Do you have any plan to release the original Discrn dataset that is used to evaluate the x-instructblip, rather than the code?

giuliannocappellari commented 8 months ago

Hey, man. How did you install x-instructblip? Do you simply follow the readme? I had issues with Ninja and CUDA. Did you have the same problem? Can you help?

artemisp commented 7 months ago

Thank you for your interest in the model. You can download the DisCRn dataset as follows:

from lavis.datasets.builders import load_dataset ds = load_dataset('image_pc_discrn') # image-point cloud data ds = load_dataset('audio_video_discrn') # audio-video data

Note that for the 3D pairs, we were required to remove 628 out of 28173 datapoints from the release, due to being associated with by-sa licensed point clouds. However, it should not skew the results significantly. To evaluate X-InstructBLIP on DisCRN: python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/xinstruct_blip/eval/discrn/audio_video_describe.yaml python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/xinstruct_blip/eval/discrn/image_3d_describe.yaml

Make sure to update the Audiocaps audio and corresponding youtube-videos (using a tool like youtube-dl) path with your local install. Same for Objaverse point-clouds (here&prefix=&forceOnObjectsSortingFiltering=false)) and Cap3D rendered images (here)

About the installation I responded to a different thread, so we can try to debug the issue.