Closed carversh closed 8 months ago
I realize it's because many samples only have a single instance.
Actually, in order to predict on a new dataset, do I need multiple samples of the same perturbation in my input dataset? I wouldn't think so if I'm just predicting on the already trained model.
Yes, having only a single sample of a specific perturbation type does throw unexpected errors. I believe this is linked to the section of the dataloader that computes differentially expressed genes.
If you are using these perturbations (which have only a single sample) for training or validation then that isn't really recommended anyway. If you want to use them just during inference then it is not needed to include the post-perturbation expression information in the dataloader when training the model. You can just directly predict the perturbation effect for that perturbation.
in order to predict on a new dataset
At the moment, training and prediction is done in the context of the same dataset. We have not designed GEARS for cross-dataset prediction.
Hi,
I am trying to input my own data into GEARS, and am encountering an issue after I thought I formatted my scanpy object correctly. I also tried adding the ensembl id as the index to the .var dataframe, however this still triggered the same error. Any solutions to this? FYI I deleted my .raw file because I couldn't save my h5ad file if it wasn't deleted. The formatting in the .raw file is different from the formatting you require in the .obs and .var dataframes.
Here is the line of code that is triggering an error:
Here is the error being triggered:
Here is the structure of my count matrix:
Here is the structure of adata_final.var:
Here is the structure of adata_final.obs: