sagnikbanerjee15 / Finder

A fully automated gene annotator from RNA-Seq expression data
MIT License
51 stars 14 forks source link

Finder on PacBio data #79

Open WietseHR opened 11 months ago

WietseHR commented 11 months ago

Hello, I am currently trying to run finder on three whole genome samples:

  1. Sequenced with Illumina HiSeq x ten
  2. Sequenced with Illumina Novaseq 6000
  3. Sequenced with PacBio SMRT

Samples 1 and 2 are doing fine at the moment but sample 3 generates the following error with star:

EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
@SRR12124361.1
GCGTCGGATAAGCCTGTCATAAGTCATAAATTACACAATACACATCAGCCATTTTGGAAGACCCGATGATTGGTTTGTTTGACCATACCATCTTCATCGCGGAAGATCTCCATCATCGCATGTCCCAACCAAAATTCCGATCCTCCGGCAACCTCGTGTAGCCCCCTCTTGGAATAAAACCTAGTTACAGGAGAAGCGGCCGGCATGGTCCATTTCCGATCAAAGCTCACCGCTCTCACATGGACGGGAATATCGCAGTGTTCCGGTTTGCCTGTATATAGCTTCTGTTATGTAGCGGTAACTGTGAGGGAAATGTCGCATGACGATATAACGAAAGCTTACCTTGCCTTACGCGAAGGGGTAGTGTGCGAGACTGTGAAGGTAGGCTGACGTGGACTACGCCAAGTAGCCATCGATAGCGACAGCCCATGTATATAGGTATAAACTAAGCCATATTACTATATCCAATCTCGCGTTGAACATCTTGGTGAGCGAAATGAGTCTTCCGCCGTACATAATGGGATGTCAGCGAGAGTCATCTGTGCGAGAGCACAGGGTAAAATCTCCAAGCCAAATAGGAATACATTTTGTTACAGGGATCAGACGTCGTCCTTCACTTCGGGGGGACAAAACCAGTCCTGTGAGGCAAA
SOLUTION: fix your fastq file

Jul 12 09:43:06 ...... FATAL ERROR, exiting
Segmentation fault (core dumped)

If I check this read ID in the FASTQ file I see that the quality string length and the sequence length are both the same length: 1979 I think it has something to do with the long reads from PacBio sequencing (the error sequence is just a small part of the original sequence). My question is if there's a workaround for Finder to work with Long read data? Thanks in advance!

sagnikbanerjee15 commented 11 months ago

Hello @WietseHR,

Thank you for your patience. I appreciate your interest in finder. The current version of finder will not be able to handle long reads. We currently use STAR to perform alignment, designed to work only with short reads. We are designing a new version of finder which will be able to work with long reads.

Thank you, Sagnik

RacheliHadjez commented 9 months ago

Hi, I also wanted to use finder for PacBio data, is that still the case? It won't work for long reads?

Thank you, Rachel

Maxim-Karpov commented 1 month ago

@RacheliHadjez @WietseHR

It is possible to tweak the code to run STARlong which is a modified version of STAR designed for aligning long reads, however, its performance in this use case is not the best out of the available open source aligners you can find, as per: https://academic.oup.com/bioinformatics/article/34/5/748/4562330. Nonetheless, it should work.