Closed jje42 closed 2 years ago
Yeah, I think this basically comes down to the relatively simple representation in the TSV (just the two edges of the junction - the last base of the exon on the left, the first base of the exon on the right) vs the BED12 record.
To compare BED12 format to that TSV you need to consider the coordinates in columns 1 (chrom), 2 (chromStart), 3 (chromEnd) AND the information in column 10 (blockCount), 11 (blockSizes), 12 (blockStarts).
Try loading your BED file in IGV. Then compare to the coordinates shown in the TSV manually. Does it make sense then?
On Tue, Mar 1, 2022 at 11:35 PM Jonathan Ellis @.***> wrote:
I'm using regtools v0.5.2, and I've run the command
regtools cis-splice-effects identify -o out.tsv -v out.vcf -j out.bed -s 1 in.vcf.gz in.bam Homo_sapiens.GRCh37.dna.primary_assembly.fa Homo_sapiens.GRCh37.87.gtf
but when I look at the junctions in the TSV and BED file, I notice that the start/ends differ. For example, TSV
chrom start end name score strand splice_site acceptors_skipped exons_skipped donors_skipped anchor known_donor known_acceptor known_junction gene_names gene_ids transcripts variant_info 12 402334 404742 JUNC00000004 1 ? CC-TC 0 0 0 N 0 0 0 NA NA NA 12:404740-404741 12 402335 404739 JUNC00000005 50 - GT-AG 0 0 0 DA 1 1 1 KDM5A ENSG00000073614 ENST00000382815,ENST00000399788 12:404740-404741 12 404959 406207 JUNC00000006 117 - GT-AG 0 0 0 DA 1 1 1 KDM5A ENSG00000073614 ENST00000382815,ENST00000399788 12:404740-404741 12 405505 446017 JUNC00000007 1 - CT-AC 20 15 21 N 0 0 0 NA NA NA 12:404740-404741 12 406366 416112 JUNC00000008 24 - GT-AG 0 0 1 DA 1 1 1 KDM5A ENSG00000073614 ENST00000382815,ENST00000399788 12:404740-404741
BED
12 402254 404761 JUNC00000004 1 ? 402254 404761 255,0,0 2 80,20 0,2487 12 402238 404833 JUNC00000005 50 - 402238 404833 255,0,0 2 97,95 0,2500 12 404866 406305 JUNC00000006 117 - 404866 406305 255,0,0 2 93,99 0,1340 12 405481 446091 JUNC00000007 1 - 405481 446091 255,0,0 2 24,75 0,40535 12 406266 416211 JUNC00000008 24 - 406266 416211 255,0,0 2 100,100 0,9845
The junctions have the same names and number of supporting reads, so I would expect them to have the same start/end positions. Am I misinterpreting what these files represent or is there a reason the start/ends to not match?
Thanks.
— Reply to this email directly, view it on GitHub https://github.com/griffithlab/regtools/issues/159, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRFGGTO54KCEEADR6MY63U534YJANCNFSM5PWIV5JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Yes, I see what you mean. Thanks for clarifying.
I'm using regtools v0.5.2, and I've run the command
but when I look at the junctions in the TSV and BED file, I notice that the start/ends differ. For example,
TSV
BED
The junctions have the same names and number of supporting reads, so I would expect them to have the same start/end positions. Am I misinterpreting what these files represent or is there a reason the start/ends to not match?
Thanks.