Closed ezherman closed 1 year ago
Further details:
The failure did not occur when I ran homopolish using a local NCBI database containing 53 Pseudomonas aeruginosa complete assemblies. However, quast shows the same results before and after running homopolish. Is this issue specific to my assembly? I would have expected some indels to be removed by homopolish.
Hereby the output after running homopolish with the local database:
[2022/12/08 15:44] INFO: RUN-ID: contig_1
contig_1
/mnt/c/Users/elh605/assemble-cf-isolates/data/long-read-seq-sup/barcode24/flye_medaka_homopolish_assembly/debug
[M::mm_idx_gen::0.182*0.74] collected minimizers
[M::mm_idx_gen::0.204*1.02] sorted minimizers
[M::main::0.204*1.02] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.211*1.02] mid_occ = 100
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.217*1.02] distinct minimizers: 618983 (98.93% are singletons); average occurrences: 1.015; average spacing: 9.994
[M::worker_pipeline::41.389*3.25] mapped 56 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -cx asm5 --cs=long -t 4 /mnt/c/Users/elh605/assemble-cf-isolates/data/long-read-seq-sup/barcode24/flye_medaka_homopolish_assembly/debug/contig_1/contig_1.fasta /mnt/c/Users/elh605/assemble-cf-isolates/data/long-read-seq-sup/barcode24/flye_medaka_homopolish_assembly/debug/contig_1/All_homologous_sequences.fna.gz
[M::main] Real time: 41.398 sec; CPU: 134.460 sec; Peak RSS: 1.188 GB
41.644439935684204
[2022/12/08 15:45] INFO: Stage: Homologous retrieval
TIME Homologous retrieval: 6 MINS 55 SECS.
[2022/12/08 15:52] INFO: Stage: Prediction
TIME Prediction: 0 MINS 2 SECS.
[2022/12/08 15:52] INFO: Stage: Polish
TIME Polish: 0 MINS 1 SECS.
TIME Total: 7 MINS 42 SECS.
@ezherman We have polished quite a few Pseudomonas aeruginosa genomes successfully using the default NCBI database. Can you provide the problematic contig for us to debug?
Sure @ythuang0522, thank you for offering that! You can find the assembly here.
The reads come from a R9.4.1 pore and were called with the superaccuracy model in Guppy v6.3.9. The reads were filtered with filtlong v0.2.1, assembled with flye v2.9.1 and polished with medaka v1.7.2. Do you require any further details/data?
Hi @ythuang0522, do you happen to have any updates on this? I am getting the same error with other Pseudomonas aeruginosa genomes from the same sequencing run. The download link expires tomorrow, but I can create a new one if needed. Thanks!
We just found that you added the wrong argument in your command. You shouldn't add --genus
in your script, which will completely ignore the similarity of related genomes. You should just let the program search for most-related genomes automatically. My student confirmed your genome can be polished as expected.
homopolish polish -a yourgenome.fasta -s bacteria.msh -m R9.4.pkl -o youroutput
Thanks so much @ythuang0522, to your student as well!
I initially chose to use --genus
instead of -s
because the process was being killed on my local machine with -s
(probably a memory issue?). I tried using -s
on my university's cluster, which worked as expected 🎉.
For reproducibility, it would be good to solve the issue on my local machine with the -s
option. I will open a separate issue for this.
Hi,
Thank you for releasing this awesome package! I'm excited to use it for my data. I was hoping you could help me with the following message, after which the program stops. I was expecting there to be prediction and polishing steps, in addition to homologous retrieval.
Below you can find the command that I ran from within the
homopolish
directory. I am running homopolish v0.4.1.Below are the contents of
issue.log
:Thanks in advance!