jrzaurin / pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Apache License 2.0
1.3k stars 190 forks source link

Is it possible to use multiple texts and or images? #125

Closed hugolytics closed 5 months ago

hugolytics commented 1 year ago

Would it be possible to embed several texts at the same time (using the same, or possibly different text models)?

Im working on a medical problem, different doctors evaluate different aspects of the patient's state.

The workaround i am using right now is to concatenate the different text into one, so while my dataset might have (free-text) columns like: medical_history, general_impression, walking_test, etc. i can do medical history: ..... general_impression: ... walking_test: ...
and just hope the transformer will learn this.

the other (nicer) way to do it would be to concatenate the embeddings, so embed the columns separately and fuse them.

I think it should be possible, for me it would be fine to use the same text model in the different texts, so it would really be a matter of having trainer.fit(X_text= accept a List[Tuple] for example.

would this be difficult for me to customise? I'd be willing to contribute and document.

jrzaurin commented 1 year ago

Hey @hugolytics

thanks for opening the issue!

At the moment is not, but as I am about to start fully integrating with Hugginface, maybe is a feature that we can bring (bear in mind involves multiple preprocessors, tokenizers, etc...which is fine, just not that straightforward)

Now, there are a few things I am not fully sure I get...when you mean using the same text model, let's assume is a simple RNN, you want to use that RNN for each column, sequentially? Because if that is the case, I think that would be "catastrophic" for the learning process. The main point of these multimodal models is the "joint learning" and if one sequentially passes data to a model, there is nothing learned "jointly".

Anyways, we can start by allowing multiple text/images inputs and models. The fusion of the embeddings is easier, I think, as we could just fuse them via existing FC heads (or dot products, etc...)

But anyway, let's do something. Let's keep this issue opened and post here relevant stuff and maybe we can discuss more in detail in the slack channel? : https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-1nao4o0hj-2FtP__8oASmyLsO6aMZQcA

hugolytics commented 1 year ago

By the same model i meant the same pre-trained transformer model. So the same starting point, and finetune them on the regression task simultanously. (but obviously the different models would have their own distinct fine-tuned weights in the end). The reason i brought it up is because in the current API, the WideDeep class only takes a single deeptext module. So that would not really be an issue for my usecase (although it would not generalize well if one column was french and the other chinese).

Anyway, ill join the slack channel, im also using huggingface models atm. So i am also interested in the integration!

jrzaurin commented 1 year ago

while we integrate, you can have a look here: https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/notebooks/17_Usign_a_hugging_face_model.ipynb

jrzaurin commented 5 months ago

Better late than never I guess :)

Check this release