Then combine the image encoder with the text encoder, check the model out, if it gives you something you wanted - Githubissues

junyuan-fang / Vision-Language-on-3D-Scene-Understanding

MIT License

1 stars 0 forks source link

Then combine the image encoder with the text encoder, check the model out, if it gives you something you wanted #13

Open junyuan-fang opened 12 months ago