Open junaruga opened 2 years ago
According to the full log of running the minohifi.py
with the example files. I see many configs are processed. Perhaps, we can reduce the number of the contigs in the file to reduce the total running time?
https://gist.github.com/junaruga/b7ebdc41df63a3b041c5ae53797a1a29#file-mitohifi_with_example_data_reads_file-log-L42-L422
Seeing the log above with the example files. I think the performance tuning point is step 7. https://gist.github.com/junaruga/b7ebdc41df63a3b041c5ae53797a1a29#file-mitohifi_with_example_data_reads_file-log-L42-L1561
The step took 67 minutes in the total running time 75 minutes.
2022-10-18 11:22:29 [INFO] 6. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s).
...
Gene CYTB contains frameshift
2022-10-18 12:30:20 [INFO] 7. Now the rotated contigs will be aligned
I created a small reads FASTA file ilDeiPorc1.reads.small.fa
that is just the first 20045 lines of the ilDeiPorc1.reads.fa
. The data is 10% size of the current example file. It creates only 1 conitg. And it took 5 minutes 31 seconds. The full log is here.
$ wc -l ilDeiPorc1.reads.*
244682 ilDeiPorc1.reads.fa
20045 ilDeiPorc1.reads.small.fa
264727 total
$ time docker run --rm -w /data/ -v /home/jaruga/tmp/mitohifi/exampleFiles/:/data/ -t docker.io/biocontainers/mitohifi:2.2_cv1 mitohifi.py -r /data/ilDeiPorc1.reads.small.fa -f /data/MW539688.1.fasta -g /data/MW539688.1.gb -t 4 -o 2 /data/ -v /home/jaruga/tmp/mitohifi/exampleFiles/:/data/ -t docker.io/biocontainers/mitohifi:2.2_cv1 mitohifi.py -r /data/ilDeiPorc1.reads.small.fa -f /data/MW539688.1.fasta -g /dat2022-10-25 16:33:56 [INFO] Welcome to MitoHifi v2. Starting pipeline...
2022-10-25 16:33:56 [INFO] Length of related mitogenome is: 15354 bp
2022-10-25 16:33:56 [INFO] Number of genes on related mitogenome: 37
...
2022-10-25 16:34:06 [INFO] 6. Now we are going to circularize, annotate and rotate each filtered contig. Those are potential mitogenome(s).
2022-10-25 16:34:06 [INFO] Working with contig ptg000001l
2022-10-25 16:34:06 [INFO] Started ptg000001l circularization
2022-10-25 16:34:07 [INFO] ptg000001l circularization done. Circularization info saved on ./potential_contigs/ptg000001l/ptg000001l.circularisationCheck.txt
2022-10-25 16:34:07 [INFO] Started ptg000001l (MitoFinder) annotation
2022-10-25 16:36:52 [INFO] ptg000001l annotation done. Annotation log saved on ./potential_contigs/ptg000001l/ptg000001l.annotation_MitoFinder.log
2022-10-25 16:36:52 [INFO] Started ptg000001l rotation.
2022-10-25 16:36:52 [INFO] Rotation of ptg000001l done. Rotated is at ptg000001l.mitogenome.rotated.fa
...
2022-10-25 16:39:24 [INFO] Pipeline finished!
2022-10-25 16:39:24 [INFO] Run time: 328.83 seconds
real 5m31.986s
user 0m0.048s
sys 0m0.032s
I tested the contigs file case. The running time was short. The full log is here.
$ time docker run --rm -w /data/ -v /home/jaruga/tmp/mitohifi/exampleFiles/:/data/ -t docker.io/biocontainers/mitohifi:2.2_cv1 mitohifi.py -c /data/test.fa -f /data/NC_016067.1.fasta -g /data/NC_016067.1.gb -t 4 -o 2
...
2022-10-25 20:06:23 [INFO] Pipeline finished!
2022-10-25 20:06:23 [INFO] Run time: 338.59 seconds
real 5m41.835s
user 0m0.044s
sys 0m0.032s
This issue comes from the https://github.com/marcelauliano/MitoHiFi/issues/26#issuecomment-1282343446 . This is a proposal for enhancement.
I executed the
mitohifi.py -r <reads file> ...
with this repository's example files. And it takes the "4500.39 seconds" = 75 minutes. It is great if this repository has small example data files to finish the command for a short time.