ythuang0522 / homopolish

High-quality Nanopore-only genome polisher
GNU General Public License v3.0
65 stars 12 forks source link

No output after successful run #40

Closed schorlton closed 2 years ago

schorlton commented 2 years ago

Thanks again for the great tool. I ran Homopolish 0.3.2 using the command python3 homopolish.py polish -a consensus.fasta -m R9.4.pkl -s refseq.msh -o output. Oddly, it doesn't produce any output (ie no FASTA file in output) but looks like it succeeded?

I'm attaching consensus.fasta.gz, and the mash screen can be found here.

The last couple lines of the log file look like:

Closely-related genomes Ani less than 80, or sequence too short , only polish by Mash 
TIME Download closely-related genomes time: 0 MINS 0 SECS.
This contig's npz file is empty.
[2021/09/06 20:22] INFO: RUN-ID: contig_1074
[2021/09/06 20:22] INFO: Stage: Select closely-related genomes
TIME Select closely-related genomes: 0 MINS 1 SECS.
[2021/09/06 20:22] INFO: Stage: Download closely-related genomes
 INFO: 0 homologous sequence need to download: 
>>>>>>>>>>>>>>>>>>
Reference = []
Query = [/data/homopolish/homopolish/debug/contig_1074/contig_1074.fasta]
Kmer size = 16
Fragment length = 3000
Threads = 1
ANI output file = /data/homopolish/homopolish/debug/contig_1074/ANI.txt
>>>>>>>>>>>>>>>>>>
ERROR, skch::validateInputFiles, Count of query and ref genomes should be non-zero
cat: '/data/homopolish/homopolish/debug/contig_1074/homologous_sequences/*': No such file or directory
[M::mm_idx_gen::0.001*1.78] collected minimizers
[M::mm_idx_gen::0.001*1.58] sorted minimizers
[M::main::0.001*1.57] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.001*1.53] mid_occ = 50
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.001*1.49] distinct minimizers: 322 (100.00% are singletons); average occurrences: 1.000; average spacing: 10.012; total length: 3224
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -cx asm5 --cs=long -t 1 /data/homopolish/homopolish/debug/contig_1074/contig_1074.fasta /data/homopolish/homopolish/debug/contig_1074/All_homologous_sequences.fna.gz
[M::main] Real time: 0.002 sec; CPU: 0.002 sec; Peak RSS: 0.003 GB
Closely-related genomes Ani less than 80, or sequence too short , only polish by Mash 
TIME Download closely-related genomes time: 0 MINS 0 SECS.
This contig's npz file is empty.
[2021/09/06 20:22] INFO: RUN-ID: contig_2196
[2021/09/06 20:22] INFO: Stage: Select closely-related genomes
TIME Select closely-related genomes: 0 MINS 1 SECS.
[2021/09/06 20:22] INFO: Stage: Download closely-related genomes
 INFO: 0 homologous sequence need to download: 
>>>>>>>>>>>>>>>>>>
Reference = []
Query = [/data/homopolish/homopolish/debug/contig_2196/contig_2196.fasta]
Kmer size = 16
Fragment length = 3000
Threads = 1
ANI output file = /data/homopolish/homopolish/debug/contig_2196/ANI.txt
>>>>>>>>>>>>>>>>>>
ERROR, skch::validateInputFiles, Count of query and ref genomes should be non-zero
cat: '/data/homopolish/homopolish/debug/contig_2196/homologous_sequences/*': No such file or directory
[M::mm_idx_gen::0.001*1.97] collected minimizers
[M::mm_idx_gen::0.001*1.71] sorted minimizers
[M::main::0.001*1.69] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.001*1.64] mid_occ = 50
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.001*1.59] distinct minimizers: 150 (100.00% are singletons); average occurrences: 1.000; average spacing: 10.213; total length: 1532
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -cx asm5 --cs=long -t 1 /data/homopolish/homopolish/debug/contig_2196/contig_2196.fasta /data/homopolish/homopolish/debug/contig_2196/All_homologous_sequences.fna.gz
[M::main] Real time: 0.002 sec; CPU: 0.002 sec; Peak RSS: 0.003 GB
Closely-related genomes Ani less than 80, or sequence too short , only polish by Mash 
TIME Download closely-related genomes time: 0 MINS 0 SECS.
This contig's npz file is empty.
TIME Total: 24 MINS 28 SECS.
ythuang0522 commented 2 years ago

I saw 0 homologous sequence need to download: . This is very likely due to no related genomes in your sketch satisfying the 95% ANI threshold with the consensus.fasta. Are you sure they are related species? Though the threshold can be lowered, we rarely do so unless polishing highly-mutated viruses.

--mash_threshold MASH_THRESHOLD
                        Mash output threshold. [0.95]
schorlton commented 2 years ago

No, I am not sure! I think knowing that there is a close genome to your draft assembly for polishing is a very strong assumption. For example, if I want to polish a metagenome or unknown bacterial isolate, then I need to pre-check that my organism(s) are contained in NCBI.

Instead, can I suggest that Homopolish just outputs unpolished contigs if there are no close matches with a warning? I was under the impression that it did this already.

ythuang0522 commented 2 years ago

You are right. We should prompt the user and output this way. This should be of higher priority. Will revise it next week.

ythuang0522 commented 2 years ago

We updated the code on Github which would output the unpolished contigs and prompt the user when no sufficient related genomes (<5) are found. Thanks for your helpful suggestions.

schorlton commented 2 years ago

Amazing! Thanks for being so quick and receptive. I see you bumped the version to 0.3.3 - could I just get you to tag that and perhaps even trigger the push to bioconda? Thanks again!!

ythuang0522 commented 2 years ago

v0.3.3 has been tagged and pushed to bioconda, which should be available soon.

schorlton commented 2 years ago

Thank you!!