Closed Jelinek-J closed 2 years ago
Thank you for raising this issue. It seems serious. I am investigating.
You are correct. It should be the opposite. It should be xB <- xB[ xB$Set == "List", ] It is an embarrassing error. I double-checked the enrichment P-value calculation and didn't find the same kind of error. I really appreciate you pointing this out. The open-source community is great. If you have any other suggestions or need any assistance, please let us know.
If I understand it well, there is a bug in ShinyGO: background genes from input form are excluded instead of preserved on line 1038 https://github.com/iDEP-SDSU/idep/blob/0670e008c4e4a813bac912e68ebedce2897d42b1/shinyapps/go75/server.R#L1038 (and there are two other occurrences of the same code in the file).
It can be demonstrated by comparison of Coding sequence length plots in Plots tab generated by ShinyGO v0.741 for following inputs (the same problem is in the current v0.75, but it does not have published database files yet, so I'm not able to generate so demonstrative examples):
ENSG00000262730,ENSG00000263465,ENSG00000263465,ENSG00000282665,ENSG00000236737,ENSG00000279552,ENSG00000288258,ENSG00000282302,ENSG00000282302,ENSG00000281593,ENSG00000288715,ENSG00000152207,ENSG00000152207,ENSG00000282107,ENSG00000189127,ENSG00000288608,ENSG00000197405,ENSG00000283877,ENSG00000254732,ENSG00000135116
(the first 20 _proteincoding genes in _Human__hsapiens_gene_ensemblGeneInfo.csv from geneInfo.tar.gz sorted by _cdslength);ENSG00000173821,ENSG00000205277,ENSG00000151914,ENSG00000127603,ENSG00000221843,ENSG00000281123,ENSG00000167548,ENSG00000112159,ENSG00000288121,ENSG00000143341,ENSG00000215182,ENSG00000283158,ENSG00000117983,ENSG00000185567,ENSG00000277585,ENSG00000054654,ENSG00000175820,ENSG00000154358,ENSG00000183091,ENSG00000155657
(the last 20 _proteincoding genes in _Human__hsapiens_gene_ensemblGeneInfo.csv from geneInfo.tar.gz sorted by _cdslength);As I understand it, Background density in the case 3 should be equal to List density in the case 2 (because it is the same list). But instead, Background density in the case 3 is similar to Background density in the case 1 with one difference - x-range is shorter in the case 3 and the reduction corresponds to peak of List density in the case 2.
And one related problem: In the case no customized background is used, genes from the list are excluded from the default background (whole genome), as mentioned in your article
But in the case a customized background is used, genes from the list are not excluded from the background, which seems to be non-intuitive/ inconsistent.