Open koujiaodahan opened 4 years ago
change -t 8
to -t 64
perhaps
Thanks, i have runned the 55 threads shell and not break the 8 threads process. How long do you think it will take to run the both scripts
Did you change the output directory? I have no idea how long it might take? Depends on coverage of long and short reads?
sure,i set a new output dir
it is always running minia for over 24 hours ,is it normal?
Minia is very fast, but genome size and coverage influence its runtime also probably choice of k-mer length and other similar types of settings.
So,is there any recommended parameters for running human genome assembly?
Are you trying out the assembler with someone else's data or do you have a new human genome assembly that you would like to make with your own data? I would think that it would have finished by now (~5 days running). Again, you haven't specified the coverage of the Illumina or I guess Oxford Nanopore data that you are using. You can also read the paper describing HASLR for perhaps more information on the program.
Sorry,im trying to assembly a human genome, The coverage of both short reads and long reads is 120X
I would recommend you try either GraphAligner
(https://github.com/maickrau/GraphAligner) or Ratatosk
(https://github.com/DecodeGenetics/Ratatosk) to error correct your Nanopore reads with your Illumina reads then assemble with Flye
(https://github.com/fenderglass/Flye) using the --nano-corr
option. Ratatosk
even has a faster reference based method whereby to correct the reads (I haven't used this method, so I don't know the details). For Flye
I really don't think you need 120x Nanopore coverage, especially if you can correct the reads. See here for running Ratatosk
or here for running GraphAligner
.
Edit: I guess you could use 120x Nanopore reads for a Human assembly (https://github.com/fenderglass/Flye/blob/3ee5b3390a5f88c36d0869d0382c75aba3b1f5cc/README.md#flye-benchmarks), although these data come from CHM13 (homozygous cell line). Also note the 4000 CPU hours (divide 4000 by number of available cores and you get approximately how many wall hours the assembly would take).
Thanks,jelber2. so haslr is not advised ?why?
In my experience, HASLR will generate very good statistics (N50, etc) for assembly using raw long reads and accurate short reads, but the error rate (indels and substitutions) of the final assembly is similar to the error rate of the long reads and not the short reads. One can improve the error rates by using long reads corrected by the short reads, and using the corrected long reads as input, but then the assembly statistics suffer. This is based off of simulation of course, and simulations are sometimes useful but can never fully capture the intricacies of real data.
Hi @koujiaodahan and thanks for trying HASLR. I'm surprised that Minia is taking so long to finish. In my experience, on short read datasets from human genome with about 40x coverage, it takes about 5 hours to finish. Are you sure that Minia assembly was the step that took a long time to finish? If yes, one solution could be subsampling short reads to about 40-50x coverage. You can use fastutils command that comes with HASLR for that purpose. So assuming you have a paired end dataset, you do the following:
fastutils interleave -q sr_1.fastq sr_2.fastq | fastutils subsample -q -g 3g -d 40 > sr_40x.fastq
With regards to the error rate of the final assembly that was raised by @jelber2, if you eventually want to perform polishing for your assembly, our results show that polished HASLR assemblies are as accurate as polished assemblies from other tools.
yeah, i agree that the coverage is too high,so i downsampled and i got error which i released at #20 . and i want to know whether your polishing method means running wtdbg2.pl after running haslr?
Hi, im running the software to assembly the human genome, i have runned one day, and it is still running,so how can i speed it? generally speaking , what memory perl thread? if i have sufficient memory ,can i set a bigger thread? my machine is 64cores,500G memory,here is my script: ~/backup_data/anaconda3/haslr/bin/haslr.py -t 8 -o ~/USER/lizhichao/Assembly/outdir/Assemblyoutput -g 3g -l ~/USER/lizhichao/Assembly/outdir/fastq/NA24385_ONT.fastq.gz -x nanopore -s ~/USER/lizhichao/Assembly/outdir/fastq/NA24385_T7.clean_1.fq.gz ~/USER/lizhichao/Assembly/outdir/fastq/NA24385_T7.clean_2.fq.gz &&\ echo "haslr finished