snap-stanford / UCE

UCE is a zero-shot foundation model for single-cell gene expression data
MIT License
136 stars 21 forks source link

is there training code available? #3

Closed szalata closed 8 months ago

szalata commented 10 months ago

I couldn't find a script used for training the model. Thanks!

Yanay1 commented 10 months ago

We're not planning on releasing a training script as of now-- we don't necessarily want people to finetune the model on individual datasets since we intend for embeddings to remain universally shareable (zero-shot). Our implementation of training is also very specific to our individual hardware setup.

Zethson commented 10 months ago

@Yanay1 I'd like to stress that science needs to be reproducible or any hypothesis cannot be confirmed. Training scripts are one small component of that.

yhr91 commented 10 months ago

Technically, all the code needed from the model side to reproduce the results in the paper is available in this repository. For instance, you can take a look at some analysis scripts shared here which only make use of embeddings produced from a pre-trained model.

The challenge you would face at the moment is not the code, but that most of our results were produced using the Tabula Sapiens v2 dataset which so far is not published.

szalata commented 10 months ago

By reproducibility, I have in mind reaching the results you have from just code. If we start with downloaded weights, we are forced to rely on the training description and we cannot try it or evaluate after training on another dataset