vandijklab / cell2sentence

Cell2Sentence: Teaching Large Language Models the Language of Biology
Other
13 stars 0 forks source link

Questions about the condition generation model #1

Closed HelloWorldLTY closed 12 hours ago

HelloWorldLTY commented 1 week ago

Hi, thanks for your interesting work. I tried to reproduce the tutorial 5, conditional generation task, with your provided model:

https://huggingface.co/vandijklab/C2S-Pythia-410m-diverse-single-and-multi-cell-tasks/tree/main

However, it seems that this model does not fit the model you used in the tutorial, like one checkpoint. Is it the correct model I should use?

I also met this error:

with open(os.path.join(base_path, 'data_split_indices_dict.pkl'), 'rb') as f:
    data_split_indices_dict = pickle.load(f)
data_split_indices_dict.keys()

The error is:


FileNotFoundError Traceback (most recent call last) Cell In[30], line 1 ----> 1 with open(os.path.join(base_path, 'data_split_indices_dict.pkl'), 'rb') as f: 2 data_split_indices_dict = pickle.load(f) 3 data_split_indices_dict.keys()

File ~/.conda/envs/cell2sentence/lib/python3.8/site-packages/IPython/core/interactiveshell.py:284, in _modified_open(file, *args, *kwargs) 277 if file in {0, 1, 2}: 278 raise ValueError( 279 f"IPython won't let you open fd={file} by default " 280 "as it is likely to crash IPython. If you know what you are doing, " 281 "you can use builtins' open." 282 ) --> 284 return io_open(file, args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: '/home/tl688/.cache/huggingface/hub/models--vandijklab--C2S-Pythia-410m-diverse-single-and-multi-cell-tasks/snapshots/data_split_indices_dict.pkl'

Thanks a lot.

SyedA5688 commented 1 day ago

Hi there,

Thank you for your question! In tutorial 5, the model checkpoint which is loaded is a cell generation model checkpoint that was finetuned using tutorial notebook 3 to do conditional cell generation on the tutorial data. Here are some additional details and steps for reproducing the checkpoint:

Tutorial notebook 3 (https://github.com/vandijklab/cell2sentence/blob/master/tutorials/c2s_tutorial_3_finetuning_on_new_datasets.ipynb) shows an example of finetuning a C2S model for cell type prediction. What was done for tutorial notebook 5 was we used tutorial notebook 3 to finetune for cell generation conditioned on cell type. This was done by:

  1. Downloading the base C2S-Pythia-410m cell generation model (available here: https://huggingface.co/vandijklab/C2S-Pythia-410m-cell-type-conditioned-cell-generation) onto disk
  2. Setting the 'cell_type_prediction_model_path' variable in tutorial notebook 3 to the path of the downloaded base C2S model
  3. Setting the 'training_task' in tutorial notebook 3 to 'cell_type_generation'
  4. Running the notebook to finetune on the training set of the tutorial data for cell generation. This will save the finetuned C2S model onto disk at the output directory, and it will also save the 'data_split_indices_dict.pkl' file which gave you your second error.

After those steps, you can use the finetuned cell generation model in tutorial notebook 5. Hopefully this helps, let me know if you have any additional questions!

HelloWorldLTY commented 12 hours ago

Thanks a lot. It is good to know that.