google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
222 stars 37 forks source link

QV for each ccs reads #61

Closed Wenfei-Xian closed 1 year ago

Wenfei-Xian commented 1 year ago

Hi, Many thanks for the amazing tool ! I just wonder it is possible to output the QV for each deepconsensu ccs reads? Here is the pbccs output : @m64079_220112_082130/103153878/ccs np=19 rq=0.999962, it contains the np and rq information for each reads. However, for deepconsensus output: @m64079_220112_082130/103153878/ccs, no other information contained, except for reads name. I want to obtain the Q20 reads , Q30 reads and Q40 reads seperately. Can you point me how to deal with instead of running deepconsensus three times. Best, Wenfei

kishwarshafin commented 1 year ago

Hello @Wenfei-Xian ,

Really sorry for such a late reply. We had a long chat about what exactly to do to support this. Finally, we have decided to provide a command line option to filter reads at any given Q-point.

With the latest DeepConsensus release of 1.2.0 you can run:

deepconsensus filter_reads --help

Filter reads based on the base qualities.
Flags:
    --input_seq            Path to input fastq or bam file.
    --output_fastq         Path to output fastq file.
    --quality_threshold    A quality threshold value applied per read.  Reads with read quality below this will be filtered out.

And see the option to filter your fastq at any given q-value. Please give it a try.

Wenfei-Xian commented 1 year ago

Many thanks !!!!!