nanoporetech / pinfish

Tools to annotate genomes using long read transcriptomics data
Other
45 stars 13 forks source link

Racon failed during polish_clusters run #10

Closed michieitel closed 5 years ago

michieitel commented 5 years ago

Hi!

I have an error message when running polish_clusters (spliced_bam2gff and cluster_gff worked fine):

polish_clusters: 10:41:12 Polishing cluster 8066429e-3347-4dbd-b47b-594085368984 of size 29 polish_clusters: 10:41:12 Failed running command: racon -t 16 -q -1 /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_8066429e-3347-4dbd-b47b-594085368984_702280846/reads.fq /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_8066429e-3347-4dbd-b47b-594085368984_702280846/alignments.sam /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_8066429e-3347-4dbd-b47b-594085368984_702280846/reference.fq > /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_8066429e-3347-4dbd-b47b-594085368984_702280846/consensus.fq - exit status 134

My command polish_clusters line:

polish_clusters -d /home/cgarcia/analysis/pinfish/corrected2/gmap/tmp -a CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2.sorted_clusters.tsv \
 -o CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2.sorted_clustered_consensus_transcripts.fasta \
 -t 16 CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2_fixmate_sorted.bam 2> CBAS_PIN_C-2_G_polish_clusters.log

Before running the pinfish tools I sorted the gmap sam file by reads, removed secondary alignments and unmapped reads and then sorted again (standard) using samtools:

samtools sort -n -@ 16 CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2.sam | samtools fixmate --reference CBAS_MASURCA-2_final.genome.scf.fasta -r -@ 16 - CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2_fixmate.bam
samtools sort -@ 16 CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2_fixmate.bam > CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2_fixmate_sorted.bam

Not sure if the polish_clusters error stems from racon or from my samtools processing. Any ideas?

Thanks Michael

bsipos commented 5 years ago

Could you check if racon runs on your system at all?

michieitel commented 5 years ago

I just ran again just the ravcon command that failed and this error appeared:

terminate called after throwing an instance of 'std::invalid_argument' what(): [bioparser::FastqParser] error: invalid file format!

This makes sense since I used corrected reads in fasta rather than raw in fastq. Can pinfish only work with raw sequence data? Wouldn't the output be more accurate with error corrected reads (given all reads where kept during correction to keep the coverage)?

bsipos commented 5 years ago

The tools should work on corrected reads as well. Please specify the temporary directory using -d, then after the crash run racon manually on the input and let me know what is the error message.

michieitel commented 5 years ago

again... it has a problem with the missing quality information, which tells me I cannot use the bam of corrected (fastA) reads for the polishing although racon itself allows this!?

Can you please specify if the input for the mapping before starting the pinfish pipeline can be fastA or does it have to be fastQ?

I ran this command:

polish_clusters -d /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp -a CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2.sorted_clusters.tsv \
 -o CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2.sorted_clustered_consensus_transcripts.fasta \
 -t 16 CBAS_MASURCA-2_final.genome.scf._ONT_cdna_gmap_combined_100bp_correction-2_fixmate_sorted.bam 2> CBAS_PIN_C-2_G_polish_clusters.log

which gave this error:

polish_clusters: 16:37:38 Polishing cluster b921fd52-3599-4645-b586-10e584dc5ccc of size 43 polish_clusters: 16:37:38 Failed running command: racon -t 16 -q -1 /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_b921fd52-3599-4645-b586-10e584dc5ccc_627859808/reads.fq /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_b921fd52-3599-4645-b586-10e584dc5ccc_627859808/alignments.sam /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_b921fd52-3599-4645-b586-10e584dc5ccc_627859808/reference.fq > /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_b921fd52-3599-4645-b586-10e584dc5ccc_627859808/consensus.fq - exit status 134

then I ran the racon command (as given in the error log) manually:

racon -t 16 -q -1  \
/home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_b921fd52-3599-4645-b586-10e584dc5ccc_627859808/reads.fq \
/home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_b921fd52-3599-4645-b586-10e584dc5ccc_627859808/alignments.sam \
/home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_b921fd52-3599-4645-b586-10e584dc5ccc_627859808/reference.fq \
> /home/meitel/data/cbas/pinfish/corrected2/gmap/tmp/pinfish_b921fd52-3599-4645-b586-10e584dc5ccc_627859808/consensus.fq

this resulted in the same error as stated in my last comment:

terminate called after throwing an instance of 'std::invalid_argument' what(): [bioparser::FastqParser] error: invalid file format! Aborted

bsipos commented 5 years ago

Okay. Can you post here the head of reads.fq? Also, can you let me know the version of racon you are using. And yes, the reads.fq should have quality values. If your original input is fasta, you have to turn it into fastq before using the tools by adding arbitrary quality values.

michieitel commented 5 years ago

Ok. That's why it did not work. Just not sure what the sense of a fake quality value should be... will try anyhow thanks for your help Michael

bsipos commented 5 years ago

Just set the fake quality values to 40.

michieitel commented 5 years ago

thanks, I will give it a try.