Closed JamesYang1209 closed 2 years ago
Hi @JamesYang1209
We update the new version(0.0.2), and this version has solved your problem. You can try it.
Thanks for the update. It did solve my problem. But I found -d become a required argument?
Traceback (most recent call last):
File "/home/james/tools/homopolish-0.2/homopolish.py", line 55, in <module>
main()
File "/home/james/tools/homopolish-0.2/homopolish.py", line 41, in main
FLAGS.output_dir, FLAGS.minimap_args, FLAGS.mash_threshold, FLAGS.download_contig_nums, FLAGS.debug, FLAGS.meta, FLAGS.local_DB_path)
File "/home/james/tools/homopolish-0.2/modules/polish_interface.py", line 326, in polish_genome
shutil.rmtree(contig_output_dir_debug)
NameError: name 'contig_output_dir_debug' is not defined
That's used in debugging mode which should not be mandatory. We have pushed a fixed version. Please reinstall again. Sorry for the inconvenience.
Thanks for the quick fix. However I found some warnings with some genome.
/usr/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NZ_CP008850.1&rettype=fasta
Are these warnings negligible ? Thank you.
Hi James, can you provide some information about your genome? N50, No. of contigs? That module is activated when the program suspects your contig is a plasmid instead of chromosome, which will then retrieve plasmids via NCBI eutils api instead of ftp. We haven't seen this warning before. If it's repeatable, we will need you providing the contig sequence for debugging.
@ythuang0522 , I'm hitting this error too so perhaps I can help. Full log below, and the debug
folder for this contig found here. Running homopolish 0.3.1. Let me know if you need any other info. Thanks for your help in troubleshooting and great software!
Query = [/data/homopolish_fail/homopolish/debug/contig_18/contig_18.fasta] [41/1428]
Kmer size = 16
Fragment length = 3000
Threads = 1
ANI output file = /data/homopolish_fail/homopolish/debug/contig_18/ANI.txt
>>>>>>>>>>>>>>>>>>
INFO [thread 0], skch::main, Count of threads executing parallel_for : 1
INFO [thread 0], skch::Sketch::build, window size for minimizer sampling = 24
INFO [thread 0], skch::Sketch::build, minimizers picked from reference = 10649502
INFO [thread 0], skch::Sketch::index, unique minimizers = 870210
INFO [thread 0], skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 242459) ... (236, 1)
INFO [thread 0], skch::Sketch::computeFreqHist, consider all minimizers during lookup.
INFO [thread 0], skch::main, Time spent sketching the reference : 9.79632 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1 : 19.1334 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.0104783 sec
INFO [thread 0], skch::main, ready to exit the loop
INFO, skch::main, parallel_for execution finished
[M::mm_idx_gen::0.187*1.00] collected minimizers
[M::mm_idx_gen::0.232*1.00] sorted minimizers
[M::main::0.232*1.00] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.242*1.00] mid_occ = 50
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.251*1.00] distinct minimizers: 670654 (98.91% are singletons); average occurrences: 1.014; average spacing: 9.991; total length: 6792935
[M::worker_pipeline::95.572*1.00] mapped 2071 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -cx asm5 --cs=long -t 1 /data/homopolish_fail/homopolish/debug/contig_18/contig_18.fasta /data/homopolish_fail/homopolish/debug/contig_18/All_homolog
ous_sequences.fna.gz
[M::main] Real time: 95.577 sec; CPU: 95.490 sec; Peak RSS: 0.918 GB
TIME Download closely-related genomes time: 0 MINS 38 SECS.
[2021/09/04 16:17] INFO: Stage: Homologous retrieval
TIME Homologous retrieval: 4 MINS 11 SECS.
[2021/09/04 16:21] INFO: Stage: Prediction
Traceback (most recent call last):
File "/homopolish/homopolish.py", line 58, in <module>
main()
File "/homopolish/homopolish.py", line 42, in main
FLAGS.output_dir, FLAGS.minimap_args, FLAGS.mash_threshold, FLAGS.download_contig_nums, FLAGS.debug, FLAGS.meta, FLAGS.local_DB_path)
File "/homopolish/modules/polish_interface.py", line 329, in polish_genome
out = without_genus(out, assembly_name, output_dir_debug, mash_screen, assembly, model_path, sketch_path, genus_species, threads, output_dir, minimap_args, mash_threshol
d, download_contig_nums, debug, meta)
File "/homopolish/modules/polish_interface.py", line 275, in without_genus
out.append(check_homopolish(paf, contig_name, contig_output_dir, contig, minimap_args, threads, download_path, model_path))
File "/homopolish/modules/polish_interface.py", line 130, in check_homopolish
finish = homopolish(contig_name, minimap_args, threads, db_path, model_path, contig_output_dir, dataframe)
File "/homopolish/modules/polish_interface.py", line 90, in homopolish
result = prediction.predict(dataframe, model_path, threads, contig_output_dir)
File "/homopolish/modules/prediction.py", line 23, in predict
result_prob = parallel(jobs)
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/joblib/parallel.py", line 1051, in __call__
while self.dispatch_one_batch(iterator):
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
for func, args, kwargs in self.items]
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
for func, args, kwargs in self.items]
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/sklearn/svm/base.py", line 620, in _predict_proba
X = self._validate_for_predict(X)
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/sklearn/svm/base.py", line 454, in _validate_for_predict
accept_large_sparse=False)
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/sklearn/utils/validation.py", line 542, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/opt/conda/envs/bugseq/lib/python3.7/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Traceback (most recent call last):
File "/bugseq/lib/python/nextflow.py", line 67, in run_cmd
result.check_returncode()
File "/opt/conda/envs/bugseq/lib/python3.7/subprocess.py", line 444, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['python3', '/homopolish/homopolish.py', 'polish', '-a', 'consensus.fasta', '-s', 'refseq.msh', '-m', '/homopolish/R9.4.pkl', '-o', '
homopolish']' returned non-zero exit status 1.
Traceback (most recent call last):
File "command.py", line 55, in <module>
main(input_assembly, mash_sketch, output_dir, metadata)
File "command.py", line 31, in main
output_dir,
File "/bugseq/lib/python/nextflow.py", line 67, in run_cmd
result.check_returncode()
File "/opt/conda/envs/bugseq/lib/python3.7/subprocess.py", line 444, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['python3', '/homopolish/homopolish.py', 'polish', '-a', 'consensus.fasta', '-s', 'refseq.msh', '-m', '/homopolish/R9.4.pkl', '-o', '
homopolish']' returned non-zero exit status 1.
Thanks for providing us the contig. Will get back to you later.
@schorlton I ran the program with the contig_18.fasta and it finished without any error (see below). However, it looks like the 20 related genomes you retrieved are totally different. e.g., GCF_001545205.1, GCF_001545185.1 in yours are not the ones found by mine (e.g., GCF_006364795.1). Are you using the default bacteria.msh for screening related genomes?
python3 homopolish.py polish -a contig_18.fasta -s bacteria.msh -m R9.4.pkl -d -o contig18
[2021/09/05 23:32] INFO: RUN-ID: contig_18
contig_18
/home/ythuang/homopolish/contig18/debug
[2021/09/05 23:32] INFO: Stage: Select closely-related genomes
TIME Select closely-related genomes: 0 MINS 12 SECS.
[2021/09/05 23:33] INFO: Stage: Download closely-related genomes
INFO: 20 homologous sequence need to download:
Downloaded GCF_005154325.1_ASM515432v1_genomic.fna.gz
Downloaded GCF_006364795.1_ASM636479v1_genomic.fna.gz
...
TIME Homologous retrieval: 0 MINS 22 SECS.
[2021/09/05 23:35] INFO: Stage: Prediction
TIME Prediction: 0 MINS 0 SECS.
[2021/09/05 23:35] INFO: Stage: Polish
TIME Polish: 0 MINS 6 SECS.
TIME Total: 2 MINS 59 SECS.
I am not. Sorry for this omission. Can you please try with this mash sketch? Thanks!
The bug should have been fixed but reappeared due to merged errors. We have pushed a correct version on Github. Please pull the latest one and it should work on ur own sketch. Thanks for reporting this issue.
python3 homopolish.py polish -a contig_18.fasta -s refseq.genomes%2Bplasmid.k21s1000.msh -m R9.4.pkl -d -o contig18
[2021/09/06 10:00] INFO: RUN-ID: contig_18
contig_18
/home/ythuang/homopolish/contig18/debug
[2021/09/06 10:00] INFO: Stage: Select closely-related genomes
TIME Select closely-related genomes: 0 MINS 5 SECS.
...
[2021/09/06 10:03] INFO: Stage: Homologous retrieval
TIME Homologous retrieval: 0 MINS 39 SECS.
[2021/09/06 10:04] INFO: Stage: Prediction
TIME Prediction: 0 MINS 1 SECS.
[2021/09/06 10:04] INFO: Stage: Polish
TIME Polish: 0 MINS 8 SECS.
TIME Total: 3 MINS 17 SECS.
Thanks! Can I suggest that you tag a new minor release with the bug fix?
Done. Tagged as v0.3.2. If no further issue i will close this one.
Awesome, thanks again!
Hi, I am using canu as my assembler, correct the sequence with racon and medaka. But when I try to use homopolish to complete the last step of correction. The error shows below.
I am not sure about this error, the input fasta looks normal. Could you kindly help me to solve this problem ? Thank you.