Open sxwcasd opened 2 years ago
Hmm, that is an unusual situation with a gene listed on multiple chromosomes -- interesting catch. Happy to entertain a pull request if you have a better validation scheme.
On Thu, Jun 23, 2022 at 4:37 PM Linghao Song @.***> wrote:
I am trying to running the annotation (svaba-annotate.R) on GENCODE db. However, the UCSC db records pulled by these 2 lines.
Are having duplicates. example:
kgID mRNA geneSymbol spID refSeq chrom txStart txEnd strand
1: ENST00000244174.11 NM_002186 IL9R Q01113 NM_002186 chrX 155997695 156010817 + 2: ENST00000244174.11 NM_002186 IL9R Q01113 NM_002186 chrY 57184215 57197337 +
This is making sense to me, that the sex chromosomes have different position and share the some mRNA. But this will hit error at the following line: Error: !any(duplicated(genes$kgID)) is not TRUE https://github.com/walaj/svaba/blob/0f60e366c300bbefbba762bcc6d2b661bd2ae74a/R/svaba-annotate.R#L67
Maybe we have have a better validation check at here?
— Reply to this email directly, view it on GitHub https://github.com/walaj/svaba/issues/115, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUZ7CG5OHOZAOFME3GMGODVQTDI7ANCNFSM5ZVTSDKQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Because we need both gencode and exonframe information, we loaded wgEncodeGencodeCompV36
table instead of knownGene
or kgXref
. And validate by start and end sites. I m not sure if that make sense to your original design (because we don't need to consider refseq id mapping). But it seems accomplished our goals and get around this issue.
I am trying to running the annotation (svaba-annotate.R) on GENCODE db. However, the UCSC db records pulled by these 2 lines. https://github.com/walaj/svaba/blob/0f60e366c300bbefbba762bcc6d2b661bd2ae74a/R/svaba-annotate.R#L59-L60
Are having duplicates. example:
This is making sense to me, that the sex chromosomes have different position and share the some mRNA. But this will hit error at the following line:
Error: !any(duplicated(genes$kgID)) is not TRUE
Maybe we have have a better validation check at here?