invictus717 / MetaTransformer

Meta-Transformer for Unified Multimodal Learning
https://arxiv.org/abs/2307.10802
Apache License 2.0
1.52k stars 114 forks source link

data2seq #38

Closed LH019 closed 1 year ago

LH019 commented 1 year ago

Can you give a demo about how to use data2seq code?

invictus717 commented 1 year ago

Please refer to Data2Seq.py

From Data2Seq import Data2Seq

auto_tokenier = Data2Seq(modality='image',dim=768)

# or 
auto_tokenier = Data2Seq(modality='video',dim=768)

# or
auto_tokenier = Data2Seq(modality='time-series',dim=768)
LH019 commented 1 year ago

Thank you for your reply, and i have another question whether data2seq process dont need the unified model, this model is for the result of data2seq?

invictus717 commented 1 year ago

First of all, I think this question is insightful, and thank you for your interest in Meta-Transformer. Currently, the Data2Seq module doesn't share the parameters across different modalities. Personally, I think that it would be very cool if the Data2Seq module could be unified by a single tokenizer model. Because modalities require distinct patterns to extract modality-effective token sequences. In our paper, we found the process of designing a unified tokenizer could start from the grouping->convolution->transformation process, yet we did not unify multimodal tokenizers.