hassonlab / 247-pickling

Contains code to create pickles from raw/processed data
1 stars 9 forks source link

Move input generation and inference for each class of models into distinct modules #122

Open hvgazula opened 1 year ago

hvgazula commented 1 year ago

The input generation, inference, and embeddings/logits extraction functions (as appropriate) tfsemb_main.py should be moved into separate scripts for causal, mlm, and seq2seq models.

zkokaja commented 1 year ago

Yes, let's flesh this out and implement a prototype for causal so we can implement it for other types as well.

zkokaja commented 1 year ago

Consider adding special tokens to causal models at the beginning, to stay true to what the model was trained on. Needs investigating.

hvgazula commented 1 year ago

https://datascience.stackexchange.com/questions/86566/whats-the-right-input-for-gpt-2-in-nlp

zkokaja commented 1 year ago

Let's talk about this again

zkokaja commented 1 year ago

waiting to resolve issue with whisper embedding generation replication