tacorna / taco

Multi-sample transcriptome assembly from RNA-Seq
http://tacorna.github.io
Other
22 stars 7 forks source link

TACO is running fine for Cufflinks output, but not for StringTie gtf file #15

Closed mukarram-ak closed 6 years ago

mukarram-ak commented 6 years ago

Hi, I'm facing an error when running TACO for StringTie assemblies. Strangely, if I use it for Cufflinks assemblies, it's running fine. I would guess this has something to do with the ordering of the attributes column or the missing FPKM attribute in the exon lines of StringTie output?

Could you please guide me how to solve this if I want to run TACO for StringTie assemblies?

Below is my error:

2017-09-21 11:19:42,720 pid=18286 INFO - Aggregating in parallel using 1 processes
Process Process-1:
Traceback (most recent call last):
  File "/home/abdul/.pyenv/versions/2.7.6/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/abdul/.pyenv/versions/2.7.6/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/abdul/.pyenv/versions/2.7.6/lib/python2.7/site-packages/taco/lib/aggregate.py", line 168, in aggregate_worker
    stats_fh=stats_fh)
  File "/home/abdul/.pyenv/versions/2.7.6/lib/python2.7/site-packages/taco/lib/aggregate.py", line 93, in aggregate_sample
    transcripts, total_expr = parse_gtf(fh, sample._id, gtf_expr_attr, is_ref)
  File "/home/abdul/.pyenv/versions/2.7.6/lib/python2.7/site-packages/taco/lib/aggregate.py", line 49, in parse_gtf
    f = GTF.Feature.from_str(gtf_line)
  File "/home/abdul/.pyenv/versions/2.7.6/lib/python2.7/site-packages/taco/lib/gtf.py", line 122, in from_str
    k, v = a.strip().split()
ValueError: too many values to unpack

The gtf files look like this (please note that these are derived from different bam files):

Stringtie

1   StringTie   transcript  1014    2123    1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "3.800383"; FPKM "1.073069"; TPM "1.643435";
1   StringTie   exon    1014    2123    1000    -   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "3.800383";
1   StringTie   transcript  2188    3154    1000    -   .   gene_id "STRG.2"; transcript_id "STRG.2.1"; cov "2.732752"; FPKM "0.771615"; TPM "1.181750";
1   StringTie   exon    2188    3154    1000    -   .   gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "1"; cov "2.732752";

Cufflinks

1   Cufflinks   transcript  113 353 1000    -   .   gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "9.7137137310"; frac "1.000000"; conf_lo "3.341671"; conf_hi "16.085756"; cov "15.216267"; full_read_support "yes";
1   Cufflinks   exon    113 353 1000    -   .   gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "9.7137137310"; frac "1.000000"; conf_lo "3.341671"; conf_hi "16.085756"; cov "15.216267";
1   Cufflinks   transcript  467 1046    1000    -   .   gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "2.1270434282"; frac "1.000000"; conf_lo "1.057062"; conf_hi "3.197025"; cov "3.331955"; full_read_support "yes";
1   Cufflinks   exon    467 1046    1000    -   .   gene_id "CUFF.2"; transcript_id "CUFF.2.1"; exon_number "1"; FPKM "2.1270434282"; frac "1.000000"; conf_lo "1.057062"; conf_hi "3.197025"; cov "3.331955";
yniknafs commented 6 years ago

@mukarram-ak interesting bug thanks for sharing. can you either upload a couple string tie GTFs that were erroring out here, or send them to me directly at yniknafs@umich.edu?

I've run successfully on string tie output before, so we should be able to sort this out. Also what version of ST are you using?

mukarram-ak commented 6 years ago

@yniknafs thanks for your response. I sent a couple GTFs ( generated by StringTie version 1.3.3) to your email, but basically all of them look like the snippet above.

yniknafs commented 6 years ago

hello @mukarram-ak . I have added a new version v0.7.3 that works. Your issue was that when using a reference when running string tie, whatever reference you used had some spaces in the GTF attribute which is not something we'd encountered thus far.

You should be all set. Let me know if you need anything else.