iDEP-SDSU / idep

Integrated Differential Expression and Pathway analysis
http://ge-lab.org/idep
123 stars 61 forks source link

Background excluded in ShinyGO #165

Closed Jelinek-J closed 2 years ago

Jelinek-J commented 2 years ago

If I understand it well, there is a bug in ShinyGO: background genes from input form are excluded instead of preserved on line 1038 https://github.com/iDEP-SDSU/idep/blob/0670e008c4e4a813bac912e68ebedce2897d42b1/shinyapps/go75/server.R#L1038 (and there are two other occurrences of the same code in the file).

It can be demonstrated by comparison of Coding sequence length plots in Plots tab generated by ShinyGO v0.741 for following inputs (the same problem is in the current v0.75, but it does not have published database files yet, so I'm not able to generate so demonstrative examples):

  1. No customized background, and following IDs as list of genes: ENSG00000262730,ENSG00000263465,ENSG00000263465,ENSG00000282665,ENSG00000236737,ENSG00000279552,ENSG00000288258,ENSG00000282302,ENSG00000282302,ENSG00000281593,ENSG00000288715,ENSG00000152207,ENSG00000152207,ENSG00000282107,ENSG00000189127,ENSG00000288608,ENSG00000197405,ENSG00000283877,ENSG00000254732,ENSG00000135116 (the first 20 _proteincoding genes in _Human__hsapiens_gene_ensemblGeneInfo.csv from geneInfo.tar.gz sorted by _cdslength);
  2. No customized background, and following IDs as list of genes: ENSG00000173821,ENSG00000205277,ENSG00000151914,ENSG00000127603,ENSG00000221843,ENSG00000281123,ENSG00000167548,ENSG00000112159,ENSG00000288121,ENSG00000143341,ENSG00000215182,ENSG00000283158,ENSG00000117983,ENSG00000185567,ENSG00000277585,ENSG00000054654,ENSG00000175820,ENSG00000154358,ENSG00000183091,ENSG00000155657 (the last 20 _proteincoding genes in _Human__hsapiens_gene_ensemblGeneInfo.csv from geneInfo.tar.gz sorted by _cdslength);
  3. IDs from the first case as list of genes, and IDs from the second case as customized background.

As I understand it, Background density in the case 3 should be equal to List density in the case 2 (because it is the same list). But instead, Background density in the case 3 is similar to Background density in the case 1 with one difference - x-range is shorter in the case 3 and the reduction corresponds to peak of List density in the case 2.

And one related problem: In the case no customized background is used, genes from the list are excluded from the default background (whole genome), as mentioned in your article

T-tests are carried out to identify any significant differences between the query genes and all other background genes on the genome.

But in the case a customized background is used, genes from the list are not excluded from the background, which seems to be non-intuitive/ inconsistent.

gexijin commented 2 years ago

Thank you for raising this issue. It seems serious. I am investigating.

gexijin commented 2 years ago

You are correct. It should be the opposite. It should be xB <- xB[ xB$Set == "List", ] It is an embarrassing error. I double-checked the enrichment P-value calculation and didn't find the same kind of error. I really appreciate you pointing this out. The open-source community is great. If you have any other suggestions or need any assistance, please let us know.