Closed gm-nyc closed 4 years ago
Hi! So to clarify, are you attempting to look at cases where you have novel splice junctions in a known gene, and then see how far they are away from known junctions? I don't think we have an existing formal utility that outputs the splice junctions, but it wouldn't be too difficult to make one!
Yes! I am trying to quantify the distance from the reference splice junctions to the novel junctions I'm seeing in my samples, which have splicing aberrations. The mis-splicing can be transcriptome-wide so I am trying to generate an exon-based matrix for my cells. Does that make sense?
Yes, absolutely! My colleague and I have put together a utility to extract splice junctions/exon positions as well as the transcripts that contain them. I'm going to run some tests on it and finalize the details, and then once I'm comfortable all is going as intended, I'll let you know so you can try it out!
Hi, I am very happy to find this tool for my analysis. I tried first run today and found a problem. The error message was "SAM transcript xxx lacks an MD tag". My samples were DirectRNA Nanopore-seq mapped by Minimap2. By the way, will you develop a tool like MISO or rMATs to help detect the change of alternative splicing?
Hi iam2b, You should be able to fix this issue by running Minimap2 with the --MD flag (see issue #45). Currently we are not in the business of developing our own downstream alt splicing tool, but you might consider trying this one https://bioconductor.org/packages/release/bioc/html/IsoformSwitchAnalyzeR.html. The developer has added support for TALON abundance files.
Thank you very much. I have sloved this problem. Merry Chrismas!
Hi dewyman,
I wanted to clarify my question a little. The reason I was asking for the distance from the canonical splice junction is that I am trying to identify (and quantify) alternate 3' and 5' splice site usage and thought that the positional information for each junction would be useful since it could be compared with the reference. Thanks for your help and any thoughts/suggestions would be welcome! Hope you're having a good new year.
Hi! Don't worry, your question makes total sense. We've been working on a utility to help address your question. It's technically complete and passed our tests, but is running slowly so we were hoping to do a bit more work on it to make it run faster. In the meantime though, you're welcome to try it out:
usage: talon_get_sjs [-h] [--gtf GTF] [--db DB] [--ref REF_GTF] [--mode MODE]
[--outprefix OUTPREFIX]
Extracts the locations, novelty, and transcript assignments of exons/introns
in a TALON database or GTF file. All positions are 1-based.
optional arguments:
-h, --help show this help message and exit
--gtf GTF TALON GTF file from which to extract exons/introns
--db DB TALON database from which to extract exons/introns
--ref REF_GTF GTF reference file (ie GENCODE). Will be used to label
novelty.
--mode MODE Choices are 'intron' or 'exon' (default is 'intron').
Determines whether to include introns or exons in the
output
--outprefix OUTPREFIX
Prefix for output file
As a side note, when you run this script in 'intron' mode, the start/end positions currently include the exon base that flanks the intron on each side.
Another approach you might try for extracting splice junctions from a TALON GTF file would be to use the TranscriptClean utility described here. Outputs from this script follow the STAR splice junction output format, which is described in the STAR manual (section 4.4) here.
I hope this helps, but don't hesitate to reach out if you have more questions! Best, Dana
Hi Dana,
Thanks for your help. I am trying to run this script and it's either extremely slow or getting stuck. I subsetted my gtf by chromosome and took the smallest one (chrM in my case, with 48 total lines in the gtf) and the script is still running. Is that the expected speed or do you think there is another issue?
here is the code I'm running:
~/talon/talon-4.4.2/python/bin/talon_get_sjs --gtf ${file} --ref ~/gencode.v31.annotation.gtf --mode intron --outprefix intron
Thanks for letting us know- we'll look into it some more.
The reason it's taking so long with your current command is because your --ref file is the entire annotation. If you want to run just chrM, consider subsetting the reference GTF also.
Hey, we just fixed how long things were taking. It should run MUCH faster now, and you should be able to do so with full gtfs. Let us know if it's working for you!
Hi, I noticed that there was mention of an exon-based comparison tool but cannot find that option in the talon commands. I am trying to quantify differences in 3' and 5' exon boundaries compared with known isoforms and this would be extremely helpful. Thank you!