nsalomonis / altanalyze

AltAnalyze is a multi-functional and easy-to-use software package for automated single-cell and bulk gene and splicing analyses. Easy-to-use precompiled graphical user-interface versions available from our website.
http://www.altanalyze.org
Apache License 2.0
99 stars 30 forks source link

Missing exon array probesets from database #13

Closed GoogleCodeExporter closed 4 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Look at probeset 2352152, overlapping with ENST00000271277
2. Look at probeset 3161639 with the transcript cluster annotation 3161566

What is the expected output? What do you see instead?
Both Exon 1.0 probesets should be included in the AltAnalyze human EnsMart62 
database, but are missing. In the first case, it appears that there isa second 
gene overlapping with this probeset region, possibly resulting in exclusion. 
The entire overlapping exon (ENSE00001450658) is properly annotated in 
ensembl/Hs_Ensembl_exon.txt, hence, the issue arises in the module 
ExonArrayEnsemblRules.py (probably overlapping transcript cluster annotations). 
3161639 aligns to an intron of ENSG00000107077 and should be considered a 
"full" annotation. Although, it shares the same transcript cluster ID as exon 
aligning probesets for this gene, other annotated intron aligning probesets 
have more than 1 different transcript clsuters. Hence, it is probably excluded 
due to multiple overlapping transcript clusters. Since both 3161639 and the 
exon aligning probesets for this gene share the same transcript cluster, it 
should be included.

Testing during the next Ensembl build (EnsMart 63) should be conducted with 
these probesets to assess where they are excluded and resolve if possible.

Original issue reported on code.google.com by nsalomo...@gmail.com on 12 Jun 2011 at 7:57

nsalomonis commented 4 years ago

Deprecated.