Closed alelim-bio closed 5 years ago
Sorry but I don't think it is feasible to use nanopolish for this project. Try medaka, which is much faster.
Jared
Hello Jared,
Thank you for your answer. As a follow-up question, we have access to three different supercomputers will it be feasible with a multi-node job? I believe the largest node we can access has 704 threads or we can access multiple Sky-Lakes.
Additionally, if we lowered the amount of .fastq files for polishing would it be feasible to polish with nanopolish, for instance half the amount of files. We would like to try to get the best assembly and I believe medaka doesn't polish as well as nanopolish as it doesn't incorporate signal-level information.
Kind Regards,
Alex
How many nodes do you have access to?
Hello Jared,
We have access to around 65 nodes with a 32 core Skylake Xeon's per node.
Kind Regards,
Alex
Ok, it is worth trying then. I suggest using 4 threads per process, and as many processes as you are able to run.
Jared
Hello Nanopolish,
I was wondering if I could get your assistance on a computing dilemma we are having. To give some background, we currently have a very large plant genome assembly totaling to 343,222 contigs covering approximately 13.5 GB. Additionally, our .fastq basecalled file totals to approximately 487 GB in size. We wish to polish our genome however, just using a test set of two flow cells, a total of 29 GB, to polish our genome is taking longer than a 1 day to finish 1000 contigs on a dual Intel “Skylake” 6130 node, I feel at this rate it is unfeasible to maintain when we wish to incorporate the whole dataset.
I was wondering if you maybe could offer some advice to solve this problem or if you have had any experience dealing with a polishing project this large.
Thank you for your time. Kind Regards,
Alex