jrzaurin / pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Apache License 2.0
1.3k stars 190 forks source link

feature extraction #211

Closed xylovezxy closed 5 months ago

xylovezxy commented 6 months ago

Excuse me, may I ask if I can use clip as a feature extractor for images and text, and apply it to a multimodal movielens dataset

jrzaurin commented 6 months ago

Hey @xylovezxy

if what you mean is:

  1. use CLIP on images, get dense representations (vectors)
  2. Use those vectors as continuous cols, then combine them with the rest of the features of the movielens using this library

the answer is by all means yes :)

If what you are asking is if CLIP is included in this library, no, is not.

Let me know if this helps!

xylovezxy commented 6 months ago

Thank you for your reply. I want to use clip to extract the image and text features of movie images and descriptions from the Movielens dataset, and then use your project for further work. I want to try this method, haha. I know this library does not include the CLIP model.Thank you again!

jrzaurin commented 6 months ago

yes, sure, you can :)

xylovezxy commented 6 months ago

Hey @xylovezxy

if what you mean is:

  1. use CLIP on images, get dense representations (vectors)
  2. Use those vectors as continuous cols, then combine them with the rest of the features of the movielens using this library

Hello, how should the step 2 be implemented? I see that the scripts in the examples are all fixed model integrations

jrzaurin commented 6 months ago

look at any of the example notebooks, like this one.

Simply, if you have say 700 dim vectors, that would imply 700 continuous columns

Hope this helps

xylovezxy commented 5 months ago

Thank you