readLength option from importIsoformExpression for stringtie on ONT long read sequencing

eltonjrv commented 1 year ago

Dear Kristoffer,

Thanks for developing and maintaining this tool. I just want to ask a quick question on what to set as "readLength" on the importIsoformExpression function for Stringtie outputs from an ONT dRNA-Seq strategy. As you can imagine, long read sequencing doesn't have a fixed read length as the Illumina one. This "readLength" option is mandatory for the stringtie input type only. Consequently, I'm not able to move forward.

Looking forward for your answer, Thanks very much, Elton PS: Of course I used "-L" option (for long reads) on my stringtie runs.

kvittingseerup commented 1 year ago

Could you post the fist 6-10 couple of Lines from a result file?

eltonjrv commented 1 year ago

Do you mean a stringtie "t_data.ctab" output? If so, here it goes: t_id chr strand start end t_name num_exons length gene_id gene_name cov FPKM 1 211000022278090 . 488 1045 MSTRG.1.1 1 558 MSTRG.1 . 210.353043 257.475128 2 211000022278140 . 664 1106 MSTRG.2.1 1 443 MSTRG.2 . 856.555298 1048.435913 3 211000022278143 . 505 750 MSTRG.3.1 1 246 MSTRG.3 . 248.105698 303.684906 4 211000022278157 . 68 662 MSTRG.4.1 1 595 MSTRG.4 . 20.611765 25.229094 5 211000022278158 . 593 1034 MSTRG.5.1 1 442 MSTRG.5 . 0.000000 0.000000 6 211000022278167 . 1 463 MSTRG.6.1 1 463 MSTRG.6 . 1022.166321 1251.146118 7 211000022278182 . 1 685 MSTRG.7.1 1 685 MSTRG.7 . 3206.557617 3924.872314 8 211000022278199 . 2 577 MSTRG.8.1 1 576 MSTRG.8 . 3501.916748 4286.395996 9 211000022278200 . 1 539 MSTRG.9.1 1 539 MSTRG.9 . 3642.079834 4457.957520

eltonjrv commented 1 year ago

Dear Kristoffer,

This is to let you know that I was able to move forward setting some arbitrary readLength option of 2550 (the average of my read lengths). The curious issue is that I can get any significant results, being all "isoform_switch_q_value" and "gene_switch_q_value" equal to 1. I also tried reducing the readLength option to 1000 and got the same thing.

Any clue on the reason for that result ???

Thanks PS: My regular DESeq2 execution on the gene level generates significant DEGs.

kvittingseerup commented 1 year ago

Sorry for the late answer. I must admit I don't have a good solution for StringTie-based results.

Tbh, I've started recommending quantifying the stringtie-defined transcripts with another tool that actually outputs count data...

I'm not sure why you would not get any DE results. Could you try doing a histogram of the dIF values?

eltonjrv commented 1 year ago

Hi Kristoffer,

Yes, I ended up giving up of stringtie and am now relying on NanoCount abundance.tsv files. I had to slightly edit the headers in order to get identical to kallisto's and now I'm stuck on this error which apparently comes from the "importRdata" function.

Step 1 of 3: Identifying which algorithm was used... The quantification algorithm used was: Kallisto Step 2 of 3: Reading data... Note: importing abundance.h5 is typically faster than abundance.tsv reading in files with read_tsv 1 2 Error in tximport::tximport(files = localFiles, type = tolower(dataAnalyed$orign), : all(txId == raw[[txIdCol]]) is not TRUE Calls: importIsoformExpression -> -> stopifnot In addition: Warning messages: 1: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat) 2: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat) Execution halted

I'd appreciate if you could shed a light, Thanks

eltonjrv commented 1 year ago

Dear Kristoffer,

Just to let you know that I was able to solve that silly error, which was due to unsorted quant tables generated by NanoCount. Once putting all transcripts in the same order on each abundance.tsv file, my whole IsoformSwitchAnalyzer code ran through smoothly, identifying 122 significant dIFs.

You may now please close this issue.

Thanks

dfv commented 2 months ago

Dear Eltonjrv,

Is it possible to know how it solved the output from NanoCount as import in IsoformSwitchAnalyzer. As I am experiencing below error:

Thanks a lot.

Regards Ankit

"Error in importIsoformExpression(parentDir = system.file("PAPa_KD_counts.tsv", : The 'parentDir' argument does not lead anywhere (acutally you just suppled "" to the argument). Did you try to use the system.file("your/quant/dir/", package="IsoformSwitchAnalyzeR") to import your own data? The system.file() should only be used to access the example data stored in the IsoformSwitchAnalyzeR package. To access your own data simply provide the string to the directory with the data as: "path/to/quantification/".

eltonjrv commented 2 months ago

Hi, In my case, if I'm well recollected, it was just a matter of sorting transcript IDs to be equally ordered on all NanoCount individual outputs. I hope it helps. Best, Elton

Em seg., 16 de set. de 2024 às 09:37, aankit @.***> escreveu:

Dear Eltonjrv,

Is it possible to know how it solved the output from NanoCount as import in IsoformSwitchAnalyzer. As I am experiencing below error:

Thanks a lot.

Regards Ankit

"Error in importIsoformExpression(parentDir = system.file("/home/dasaradhip/aankit/Isoform_results/Nanocount_results/PAPa_KD_counts.tsv", : The 'parentDir' argument does not lead anywhere (acutally you just suppled "" to the argument). Did you try to use the system.file("your/quant/dir/", package="IsoformSwitchAnalyzeR") to import your own data? The system.file() should only be used to access the example data stored in the IsoformSwitchAnalyzeR package. To access your own data simply provide the string to the directory with the data as: "path/to/quantification/".

— Reply to this email directly, view it on GitHub https://github.com/kvittingseerup/IsoformSwitchAnalyzeR/issues/195#issuecomment-2352312532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4CUEACFMRPN3MVBISHBKTZW2KDNAVCNFSM6AAAAABOIZJSVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJSGMYTENJTGI . You are receiving this because you authored the thread.Message ID: @.***>

-- Elton J. R. Vasconcelos DVM, PhD

dfv commented 2 months ago

Hi, Thanks a lot for replying. I tried with sorting of transcript IDs, but after that also its showing the same error. Below is the header of input file: transcript_name raw est_count tpm transcript_length SMEST000013001.1 3.0489882739503065e-05 17.000000000002807 30.489882739503066 3424 SMEST000025001.1 1.7935225140884156e-06 1.0000000000001652 1.7935225140884157 1867

Command used in R: salmonQuant <- importIsoformExpression( parentDir = system.file("PAPa_KD_counts_nw.tsv",package="IsoformSwitchAnalyzeR") )

Thanks.

eltonjrv commented 2 months ago

I don't think you need "system.file", as the error message states.

Below is how I imported my nanocount individual files, which have an ordered.tsv suffix: ### nanoFiles = NULL for(i in 1:length(dir(pattern="ordered.tsv"))){ nanoFiles = c(nanoFiles, dir(pattern="*ordered.tsv")[i]) } nanoQuant <- importIsoformExpression(sampleVector=nanoFiles, addIsofomIdAsColumn=TRUE, calculateCountsFromAbundance=TRUE) ### I hope it helps.

dfv commented 2 months ago

Thanks a lot for replying. I did tried your suggestion as mentioned below:

nanoFiles = NULL for(i in 1:length(dir(pattern="PAPa_KD_counts_nw.tsv"))){ nanoFiles = c(nanoFiles, dir(pattern="PAPa_KD_counts_nw.tsv")[i]) } nanoQuant <- importIsoformExpression(sampleVector=nanoFiles, addIsofomIdAsColumn=TRUE, calculateCountsFromAbundance=TRUE) Step 1 of 3: Identifying which algorithm was used... Error in importIsoformExpression(sampleVector = nanoFiles, addIsofomIdAsColumn = TRUE, :
Some of the files pointed to are not quantification files from Kallisto/Salmon/RSEM/StringTie. They did no contain the column names typically generated by Kallisto/Salmon/RSEM/StringTie. Are you sure it is the right files?

How you changed in this one according to Nanocount.

Thanks.

eltonjrv commented 2 months ago

Yep, my columns' names are:

target_id length eff_length est_counts tpm

Em seg., 16 de set. de 2024 às 13:40, aankit @.***> escreveu:

Thanks a lot for replying. I did tried your suggestion as mentioned below:

nanoFiles = NULL for(i in 1:length(dir(pattern="PAPa_KD_counts_nw.tsv"))){ nanoFiles = c(nanoFiles, dir(pattern="PAPa_KD_counts_nw.tsv")[i]) } nanoQuant <- importIsoformExpression(sampleVector=nanoFiles, addIsofomIdAsColumn=TRUE, calculateCountsFromAbundance=TRUE) Step 1 of 3: Identifying which algorithm was used... Error in importIsoformExpression(sampleVector = nanoFiles, addIsofomIdAsColumn = TRUE, : Some of the files pointed to are not quantification files from Kallisto/Salmon/RSEM/StringTie. They did no contain the column names typically generated by Kallisto/Salmon/RSEM/StringTie. Are you sure it is the right files?

How you changed in this one according to Nanocount.

Thanks.

— Reply to this email directly, view it on GitHub https://github.com/kvittingseerup/IsoformSwitchAnalyzeR/issues/195#issuecomment-2352806172, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4CUEEVDYURCMBHFSJ6QTLZW3GSRAVCNFSM6AAAAABOIZJSVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJSHAYDMMJXGI . You are receiving this because you authored the thread.Message ID: @.***>

-- Elton J. R. Vasconcelos DVM, PhD

dfv commented 2 months ago

Thanks a lot for replying.

I did tried your suggestion by using above column names, but its still showing the same error:

target_id eff_length est_counts tpm SMEST017801001.1 473 11820.012959086229 21199.459358934484 SMEST067683001.1 334 10690.60914265117 19173.848186661162

Step 1 of 3: Identifying which algorithm was used... Error in importIsoformExpression(sampleVector = nanoFiles, addIsofomIdAsColumn = TRUE, :
Some of the files pointed to are not quantification files from Kallisto/Salmon/RSEM/StringTie. They did no contain the column names typically generated by Kallisto/Salmon/RSEM/StringTie. Are you sure it is the right files?

Any idea what it can be cause of it ?

Thanks.

kvittingseerup / IsoformSwitchAnalyzeR

readLength option from importIsoformExpression for stringtie on ONT long read sequencing #195