Hello,
I have a problem where I need to match entities with mixed-data (text, numerical, images) and multiple image inputs for each entity.
I was wondering if it is possible to use a custom architecture for creating the representation, so that I can use a multi-input multimodal architecture.
Thank you
Hello, I have a problem where I need to match entities with mixed-data (text, numerical, images) and multiple image inputs for each entity. I was wondering if it is possible to use a custom architecture for creating the representation, so that I can use a multi-input multimodal architecture. Thank you