issues
search
junyuan-fang
/
Vision-Language-on-3D-Scene-Understanding
MIT License
1
stars
0
forks
source link
Then combine the image encoder with the text encoder, check the model out, if it gives you something you wanted
#13
Open
junyuan-fang
opened
12 months ago