kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
96 stars 18 forks source link

readLength option from importIsoformExpression for stringtie on ONT long read sequencing #195

Open eltonjrv opened 1 year ago

eltonjrv commented 1 year ago

Dear Kristoffer,

Thanks for developing and maintaining this tool. I just want to ask a quick question on what to set as "readLength" on the importIsoformExpression function for Stringtie outputs from an ONT dRNA-Seq strategy. As you can imagine, long read sequencing doesn't have a fixed read length as the Illumina one. This "readLength" option is mandatory for the stringtie input type only. Consequently, I'm not able to move forward.

Looking forward for your answer, Thanks very much, Elton PS: Of course I used "-L" option (for long reads) on my stringtie runs.

kvittingseerup commented 1 year ago

Could you post the fist 6-10 couple of Lines from a result file?

eltonjrv commented 1 year ago

Do you mean a stringtie "t_data.ctab" output? If so, here it goes: t_id chr strand start end t_name num_exons length gene_id gene_name cov FPKM 1 211000022278090 . 488 1045 MSTRG.1.1 1 558 MSTRG.1 . 210.353043 257.475128 2 211000022278140 . 664 1106 MSTRG.2.1 1 443 MSTRG.2 . 856.555298 1048.435913 3 211000022278143 . 505 750 MSTRG.3.1 1 246 MSTRG.3 . 248.105698 303.684906 4 211000022278157 . 68 662 MSTRG.4.1 1 595 MSTRG.4 . 20.611765 25.229094 5 211000022278158 . 593 1034 MSTRG.5.1 1 442 MSTRG.5 . 0.000000 0.000000 6 211000022278167 . 1 463 MSTRG.6.1 1 463 MSTRG.6 . 1022.166321 1251.146118 7 211000022278182 . 1 685 MSTRG.7.1 1 685 MSTRG.7 . 3206.557617 3924.872314 8 211000022278199 . 2 577 MSTRG.8.1 1 576 MSTRG.8 . 3501.916748 4286.395996 9 211000022278200 . 1 539 MSTRG.9.1 1 539 MSTRG.9 . 3642.079834 4457.957520

eltonjrv commented 1 year ago

Dear Kristoffer,

This is to let you know that I was able to move forward setting some arbitrary readLength option of 2550 (the average of my read lengths). The curious issue is that I can get any significant results, being all "isoform_switch_q_value" and "gene_switch_q_value" equal to 1. I also tried reducing the readLength option to 1000 and got the same thing.

Any clue on the reason for that result ???

Thanks PS: My regular DESeq2 execution on the gene level generates significant DEGs.

kvittingseerup commented 1 year ago

Sorry for the late answer. I must admit I don't have a good solution for StringTie-based results.

Tbh, I've started recommending quantifying the stringtie-defined transcripts with another tool that actually outputs count data...

I'm not sure why you would not get any DE results. Could you try doing a histogram of the dIF values?

eltonjrv commented 1 year ago

Hi Kristoffer,

Yes, I ended up giving up of stringtie and am now relying on NanoCount abundance.tsv files. I had to slightly edit the headers in order to get identical to kallisto's and now I'm stuck on this error which apparently comes from the "importRdata" function.

Step 1 of 3: Identifying which algorithm was used... The quantification algorithm used was: Kallisto Step 2 of 3: Reading data... Note: importing abundance.h5 is typically faster than abundance.tsv reading in files with read_tsv 1 2 Error in tximport::tximport(files = localFiles, type = tolower(dataAnalyed$orign), : all(txId == raw[[txIdCol]]) is not TRUE Calls: importIsoformExpression -> -> stopifnot In addition: Warning messages: 1: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat) 2: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat) Execution halted

I'd appreciate if you could shed a light, Thanks

eltonjrv commented 11 months ago

Dear Kristoffer,

Just to let you know that I was able to solve that silly error, which was due to unsorted quant tables generated by NanoCount. Once putting all transcripts in the same order on each abundance.tsv file, my whole IsoformSwitchAnalyzer code ran through smoothly, identifying 122 significant dIFs.

You may now please close this issue.

Thanks