sellerslab / gemini

GEMINI: A variational Bayesian approach to identify genetic interactions from combinatorial CRISPR screens
BSD 3-Clause "New" or "Revised" License
15 stars 3 forks source link

Model not converging in gemini_inference #4

Open JH1606-code opened 1 year ago

JH1606-code commented 1 year ago

Hi GEMINI team,

I use your tool to analyse a combinatorial CRISPR screen. I run the dataset from the tutorial (http://www.bioconductor.org/packages/devel/bioc/vignettes/gemini/inst/doc/gemini-quickstart.html) and everything works fine. Now I tried to run the workflow on my data but end up with an error in the gemini_inference step:

Error in gemini_inference(., cores = 2, verbose = TRUE) : Model not converging! Please adjust parameters and try again.

I double checked all my inputs (counts, guide.annotation and sample.replicate.annotation) and also the Model which is created by gemini_initialize. And everything looks fine to me. Can you help me what the problem could be?

Best, Jasmine

ssj1739 commented 1 year ago

Hello Jasmine,

Thanks for your message. Could you tell me a little more about your library size/design? Is it symmetric or asymmetric? Is each guide paired with all other guides or only a subset? Are all genes paired with other genes? How many samples are you analyzing, and how many replicates per sample?

I ask because it appears that Gemini isn’t able to learn the appropriate parameters sufficiently to achieve convergence, and it’s possible this is due to insufficient shared information across guide pairs.

Thanks, Sidharth

JH1606-code commented 1 year ago

Hi Sidhardt,

thank you for your fast answer!

We have 30 gRNAs for the Non-Target-Controls (NTC) and 6 gRNAs for 6 genes. So in total our library includes 66 gRNAs. In the design of our library we have 15 gRNAs for the NTCs on the "first" position/column and the other 15 gRNAs for the NTCs on the "second" position/column and 3 gRNAs per gene on the first position/column and 3 gRNAs per gene on the second position/column. For example:

  1. position 2. position NTC_1 NTC_16 NTC_2 NTC_17 NTC_3 NTC_18 . . . . . . NTC_15 NTC_30 gene1_1 gene1_4 gene1_2 gene1_5 gene1_3 gene1_6 gene2_1 gene2_4 gene2_2 gene2_5 gene2_3 gene2_6 . . . . . . gene6_3 gene6_6

What we like to analyse now are the combinations of all guides from the first position to all guides from the second position. We don't have the combinations from the second position to all guides from the first position. For example: We have all combinations from NTC_1 to all guides that are written in the second column (2. position) -> NTC_1 to (NTC_16, NTC17, ... gene6_4, gene6_5, gene6_6). But we don't check for combinations from NTC_16 to (NTC_1, NTC_2, gene6_1, gene6_2, gene6_3) etc.

We have our plasmid counts, and one untreated and one treated sample where each sample has two replicates.

I hope I could answer all your questions?

Best, Jasmine

ssj1739 commented 1 year ago

Hi Jasmine,

To clarify, in your example you showed that you have guide pairs targeting the same gene (e.g. gene2_1 and gene2_4), but do you also have the combinations (e.g. gene2_1 being paired with gene1_4)? In other words, how many unique guide pairs exist in your library?

Also, would you be willing to share data so I can help debug? You can save your established gemini Input object as an RDS file: saveRDS(Input, "exampleInput.Rds"), and email it to me: sidharthsjain@gmail.com.

Thanks, Sidharth

JH1606-code commented 1 year ago

Hi Sidharth,

thank you for your fast response! _To clarify, in your example you showed that you have guide pairs targeting the same gene (e.g. gene2_1 and gene24) that is correct! _but do you also have the combinations (e.g. gene2_1 being paired with gene14) No, I don't have this combination.

We have 66 unique guides and in total 1089 (33x33) unique guide pairs. I sended you a mail with more informations.

Best, Jasmine