Closed oneillkza closed 1 year ago
OK, digging into the source code of importIsoformExpression
, it's looking for a file called t_data.ctab
. This is apparently BallGown output, which is optional with the current version of StringTie.
So I generated that output, and placed it in separate directories for each sample.
Now I have this issue:
> test_import <- importIsoformExpression(parentDir = '.')
Step 1 of 3: Identifying which algorithm was used...
The quantification algorithm used was: StringTie
Error in importIsoformExpression(parentDir = ".") :
When importing StringTie results the 'readLength' argument must be specified.
This argument must be set to the number of base pairs sequenced (e.g. if the
quantified data is 75 bp paired ends 'readLength' should be set to 75.
My data is Nanopore sequence data -- the read length is highly variable. I've passed it the median read length, and it seems to work, but what impact is this likely to have on the analysis?
I still can't import the merged gtf from stringtie2. But it looks like that may be due to some entried with '*' for the strand. (makeTxDbFromGFF
doesn't like this). I took a look at those entries, and they all look erroneous, and have no expression in the actual samples, so this may be a stringtie2 bug.
Thanks for eaching out :-)
Just for clarity StringTie (and StringTie2 - they are now merged so only one exists) can create the quantification files by setting the -eB
flag.
With regards to the read length, I guess the median is as good as it gets. An alternative would be to quantify the StringTie generated transcriptome with a tool with a better quantification engine such as Salmon or Kallisto.
With regards to the GTF file that sounds strange - which version of IsoformSwitchAnalyzeR are you using? Also I've never seen un-stranded transcripts from StringTie - is that common?
Thanks @kvittingseerup . I am running version 1.16.0, which I believe is the latest one from Bioconductor.
I'm fairly sure the issue with strandless transcripts is a bug in Stringtie. Here's an example entry:
chr1 StringTie transcript 24556443 24557027 1000 . . gene_id "MSTRG.440"; transcript_id "MSTRG.440.1";
chr1 StringTie exon 24556443 24557027 1000 . . gene_id "MSTRG.440"; transcript_id "MSTRG.440.1"; exon_number "1";
There are only a handful of these, all are for "novel" genes, and none of them register as being expressed in either of the samples I'm comparing when I re-run stringtie using this annotation. That said, I did try filtering those entries out,
and importGTF
still throws this error:
Error in getSeqlevelsReplacementMode(value, seqlevels(x)) :
the supplied 'seqlevels' must be a character vector with no NAs and no duplicates
I was able to get it to work in a very round about way by doing the following:
1) Import using rtracklayer::import.gff
.
2) Filter the resulting granges to only the exon entries.
3) Remove the 'type' column, since importRdata thinks its presence means there are still non-exon entries.
4) Copy the gene_id column into the ref_gene_id column.
5) Set fixStringTieAnnotationProblem = FALSE
when running imporRdata, or it crashes somewhere while trying to handle the ref_gene_id column.
But it might be good if importGTF would just work. Please let me know if you'd like me to send you the annotation gtf for debugging.
I agree. Could you email me the GTF so I can update importGTF
?
I would naturally keep it confidential and only use it for debugging.
Thanks @kvittingseerup . I've emailed you at your DTU address with a link to the file.
Could you try and use IsoformSwitchAnalyzeR v 1.16.03 or newer?
Closing due to no activity. If you respond I will open it again :)
I'm not sure what format old Stringtie used, but Stringtie2 outputs a gtf file containing both the transcript locations and the quantities.
When I try the method suggested in the vignette, using importIsoformExpression, it doesn't recognise the StringtTie output:
When I try to import an individual GTF using importGTF, it throws the following error:
Note: the same GTF imports fine using
rtracklayer::import.gtf()
, so I'll probably hack around this for now. But support for current tools would be good.