Closed steinbrl closed 1 year ago
Hi Lars,
I'm not sure what's going on here. The medaka_consensus
script was only really ever intended as a convenience and to demonstrate the various steps to use medaka. If you are putting medaka into a large bioinformatics workflow I would suggest studying the script and putting each of its steps as a discrete task in your pipeline.
Hi,
I use not the full medaka pipeline, only the consensus step, as a last polishing step after 4 rounds of racon. It is a discrete call...
medaka_consensus -i "$sample"_filtered_long.fastq -d racon_polish4.fasta -o medaka -t $cpu
This produces the error...
ok, the problem was the --threads argument. I tried it with 2 and 4, that works. This is sad, if you have a lot of computation power to use...
As stated above, if you are embedding medaka in a larger pipeline you should study the medaka_consensus
script and extract the distinct steps into your own pipeline.
a last polishing step after 4 rounds of racon
There is no need to run racon before medaka.
I found this strategy in a paper about hybrid assambly of bacterial genomes. there they benchmarked several workflows and assembly/polishing tools.
Running racon four times has not be our recommended approach for a number of years. The original "4 rounds of racon" approach is derived from it being the procedure used to train some of the early medaka models. The inference models in Medaka are now trained to correct the direct output from Flye.
Our recommended approach is therefore to use the most recent version of the Guppy basecaller, assemble with Flye, and then run medaka.
Hi,
I try to migrate my pipeline to a HPC/SLURM cluster, but the medaka polishing step produces a error.
"samtools sort: couldn't allocate memory for bam_mem Alignment pipeline failed. Failed to run alignment of reads to draft."
The samtools problem occurs only inside medaka. If I call a samtools view | samtools sort pipe manually, it works flawless, no memory issues. There are more than enough ressources on the HPC, I ran the script with 64 cores and 256GB of memory. The Long Read file have only 870MB, so, it is impossible, that there is not enough RAM. It seemed to be, that the problem is in the internal samtools call from medaka. The script runs perfectly on a Workstation (2xXeon, 40 Threads, 128GB RAM, Ubuntu), with less memory. Do yu have any ideas?
Greetings,
Lars Steinbrück