sshh12 / multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Apache License 2.0
158 stars 8 forks source link

is the training data available? #17

Closed tanganke closed 2 months ago

tanganke commented 2 months ago

thank you for the great work. Is the fine-tuning data available?

image

sshh12 commented 2 months ago

Hey! All of the dataset scripts are included and were run as-is.

e.g. https://github.com/sshh12/multi_token/blob/main/scripts/xclip_build_finetune_dataset.py

tanganke commented 2 months ago

Thank you very much!