Open camillaugolini-iit opened 4 years ago
StringTie should have recognized the tags produced by minimap2 for spliced alignments -- there should not be a need to "manually" add the XS tags (there was a release note for a previous version that we added that feature). Anyway, about the transcript assemblies that you see in the output having the undetermined strand '.' -- are they single exon transcripts ? I am afraid StringTie cannot determine the strand for single-exon transcript assemblies (unless they happen to overlap some reference transcripts given with the -G option), and it won't even look for the XS tag for non-intron read alignments. The --rf or --fr options can be used to enforce a strand for single-exon assemblies but that should only be used for stranded libraries.
Correction: I mistakenly stated above that StringTie:
won't even look for the XS tag for non-intron read alignments.
.. but that is not currently the case, StringTie should consider that tag for every single alignment now.
Maybe you can send me a small example bundle where you added the XS tag to all read alignments and yet StringTie outputs transcript assemblies with unknown strand?
Thank you, here's the head of the alignment file (every read in the sam file has the flag for reverse or sense alignment) :
Posting SAM records like this directly as part of the text is not very functional. Based on your example assembly which seems to be at location 1:1059710-1059913
, can you attach a small SAM (or better yet, a BAM file) with just the long reads from that region?
Here is the selection of reads from chromosome 1 that have the end coordinate < 1059913.
Hope it is useful.
Da: Geo Pertea notifications@github.com Inviato: lunedì 24 febbraio 2020 23:45:37 A: gpertea/stringtie Cc: Camilla Ugolini; Author Oggetto: Re: [gpertea/stringtie] stringtie does not assign the strand for sam files with setted XS tag (#262)
Posting SAM records like this directly as part of the text is not very functional. Based on your example assembly which seems to be at location 1:1059710-1059913, can you attach a small SAM (or better yet, a BAM file) with just the long reads from that region?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/gpertea/stringtie/issues/262?email_source=notifications&email_token=ANYVRMWQATC3DQBHUAGBILDREREZDA5CNFSM4KZEOPVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMZ2NMI#issuecomment-590587569, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANYVRMSEW3IC526WVYTB63LREREZDANCNFSM4KZEOPVA.
It seems the attachment is missing here (didn't work?), if the file is too large to attach here in github could you perhaps upload it to a file sharing service (Google Drive or even Firefox Send if it's below 2.5GB). Compressing it (if it's SAM and not BAM) might help to keep the file size small. Thanks!
Does this work?
Yes it worked, thanks (but github had a big hiccup just minutes ago)
I just ran stringtie on that selected.bam
file (with 1999 alignments which do not have XS tags, but have minimap's own ts
tags which StringTie recognizes):
../stringtie -v -L -o out.gtf selected.bam
The resulting GTF has 20 transcripts and they all have the transcription strand assigned (7 on reverse strand, 13 on the forward strand). So I am not sure what is happening on your side, why you are not getting a strand assigned there for the assembled transcripts? Perhaps you assigned the XS strand incorrectly ? (I guess you know that the ts
to XS
conversion is not straight forward -- i.e. you cannot just copy the +
or -
character, because minimap2 uses a different assignment there for alignments on the reverse genomic strand).
Sorry for the inconvenience, I'm sending the right file to your email.
I got the full data and I see that indeed there are cases (72 out of 2262 transcripts) where the strand information is "lost" as you described. All these generated assemblies with undetermined strand are single-exon -- even though the input consists of long reads aligned with at least two "exons" and thus they all have a transcription strand assigned by minimap2 (the ts
tag is there).
I looked at a few of these cases and it seems that StringTie refuses to consider some of those spliced alignments as valid in cases where the splice sites are very inconsistent (i.e. farther than a few bases apart) between multiple spliced alignments overlapping across that region. As you know long reads are quite noisy so when StringTie encounters a low-coverage bundle where the splice sites between two exons are inconsistent between the read alignments in that region it might split/trim the assembly there, invalidate the splice sites and as a result the output may show a single (terminal) exon transfrag with no strand information (because the strand was considered to have been assigned by the splice sites, but if those are found to not be valid.. the strand is invalidated as well).
It's not easy to fix that outcome when alignments are "messy" like that. The only way to correct this output is to provided better, more consistent spliced alignments (if those are possible, of course). Given the error rate, if that intron were real, the corresponding splice sites in each alignment should not be that far apart from each other (they were tens of bases apart in the examples that I reviewed, essentially invalidating each-other).
Hi, I have some samfiles which are output of minimap2 used for long nanopore reads (RNA seq). Because minimap does not have the tag XS, I added it manually to each read. However stringtie continue to give some reads with strand ' . ' even if the flag in the sam file is present both for "+" reads and "-" ones. What parameter does stringtie use to set the strand?