Closed hesingh closed 5 months ago
Right now we just use a pretrained tokenizer (from GPT-NeoX) and nn.Embedding. I'm not sure what changes you're suggesting.
See Table 1 in this paper. https://arxiv.org/pdf/2312.14125.pdf. Also see Figure 3.
Would you like to contribute? We welcome PRs.
I'd be happy to once I understand the code better.
How do I change Mamba to use pretrained GPT-Neox for text input but use another tokenizer for image or video, another for audio?
If you read section 3.1 of the paper, the vocabulary uses 256 special tokens, 262k image and video tokens, and 4096 audio tokens. How do I set up this vocabulary?
If you see modular mistral.ai, they have a tokenizer.py. Modular code helps port Transformer apps to Mamba. I want to add new bos and eos for different data types, e.g., audio, image, video, etc.