Open tehila17-meet opened 1 month ago
Hey that sounds super cool!
I don't have an example off hand but this is very do-able. Youd essentially just have "preprocess rows" return the raw voxel data, then "forward" just do nothing and return the voxel data, and then have the build projector function create a custom torch module that converts your voxel data into the same shape at the tokens (your custom embedding + dense layer to get it to the right token shape).
Hey, so it works but with a relatively high loss and im thinking bc the input dimension is an embedding of size 249 and its trying to be projected into a a dimension of [8, 4096] (8 tokens). Do you have any ideas how i can optimize this projector?
More data? In theory 249 to 8 tokens will actually overfit easily (so low training loss but high test).
You can also try pre-training the projector on some proxy task (e.g. train 249 - part of projector -> classifier and then chop the classifier off). This could help debug the embeding quality as well.
Will also note that loss especially in the context of lora fine-tuning like this can be misleading / not an accurate representation of efficiency. It's worth just sampling/testing your weights and seeing what's getting spit out and if it's anyway coherent.
thanks for replying :)
I have another question regarding the generate parameters - is there a reason you didnt configure top_p, top_k and a specific temperature? and if so why?
This library was mainly to proof of concept these different modalities so didn't mess with decoding params too much. Not reason it's not included (they'd work the same as any other huggingface model).
Do you have an example of training a modality that has no pretrained encoder? I want to only train the projector on ready embeddings.
My use case is a dataset of an array of numbers (each number indicating a voxel (from fmri data) intensity) and their corresponding english sentence. I want to treat the voxel array as an embedding vector that needs to be projected into a higher dimension according to the textual embeddings of each array and its corresponding sentence.
Any help would be appreciated.