ythuang0522 / homopolish

High-quality Nanopore-only genome polisher
GNU General Public License v3.0
65 stars 12 forks source link

Output file empty #18

Closed DDavila10 closed 3 years ago

DDavila10 commented 3 years ago

Hi all,

After using hompolish, the output file is empty and this is the last message that I received

[2020/11/17 23:42] INFO: RUN-ID: tig00046071_len=10956_reads=23_class=contig_suggestRepeat=no_suggestBubble=no_suggestCircular=no [2020/11/17 23:42] INFO: Stage: Select closely-related genomes [2020/11/17 23:42] INFO: RUN-ID: tig00046075_len=19388_reads=41_class=contig_suggestRepeat=no_suggestBubble=no_suggestCircular=no [2020/11/17 23:42] INFO: Stage: Select closely-related genomes [2020/11/17 23:42] INFO: RUN-ID: tig00046076_len=14681_reads=88_class=contig_suggestRepeat=no_suggestBubble=no_suggestCircular=no [2020/11/17 23:42] INFO: Stage: Select closely-related genomes [2020/11/17 23:42] INFO: RUN-ID: tig00046120_len=11932_reads=18_class=contig_suggestRepeat=no_suggestBubble=no_suggestCircular=no [2020/11/17 23:42] INFO: Stage: Select closely-related genomes TIME Total: 49 MINS 28 SECS.

Do you have an idea what is happening?

Thanks again for your help

ythuang0522 commented 3 years ago

Can you provide the full command line and one short contig of your genome? It looks to me your MAGs are indeed highly fragmented (N50=~10-20kb). The current version is difficult to identify closely-related genomes for such short contigs (our next release attempts to address this issue). But these short contigs, even without polishing, should be written to the output folder. Would be helpful if you can provide more info. --Yao-Ting

DDavila10 commented 3 years ago

Hi Yao-Ting, thanks a lot for replying and the help!

This is the full command line that I am using:

python3 homopolish.py polish -a ~/mice-0476/all_data/racon_4x_medaka_merged_mice_0476_Nanopore_metagenome.fasta -s bacteria.msh -m homopolish/R10.3.pkl -t 14 -o racon_4x_medaka_homopolish

and one short contig of the genome is this:

contig_100 GTTTATAGGCAATAATCAATCACGTTATATATAGAATGTCAAATAATAGTCATTATAAATAGTCATTGAATTAAAAAGAATAACTATACAAAAAATAAATTTGATGTATAATAATACATAAGGATAAATTTTTATAATTATAGATATTATCCCTGAAAGTCGTGAAATAGTTACTCAAAACATATTCTGTGAGATAACTCATGGGGTATTATAGGCTGAAACCATTCATTTTACTGGATATATTGTCCTTTTAGTATTGCTTTTTATTGTCTGGATAATGGGGTTAAAACCCTCTCGCCCTAAGGTGAAACCTTAGGTCAACCCCATAGCTTCAAGTTTAGAAAAAAGAAAATAGAAATACTAAAGCTTGAGGTGAACAATATGGAAAATTACAGAAAATCGTCCTATTGCATATATGATATAAAAAATACCATTTGGTATGGATAACAAATACCGTAAAAAAAACCGGTAATAACAGGGCAAATAGCCGTTAGGACAAGAAACCATATCGGAATGATGGACATCATCGAACATATAGGCTCCACATCGGGTATTGATAATGCGATATACTCTTCCACGGATTTGGCTACCGCACAGAAGATTATATCCCTGGCCAGGTACCTCCTTGCCACTAACGGGCAGTCCCTCCCGGGCATCCTGACATGGCAGTATAACCATCCCATTCCCTATGAGGATGGGATCACTGAGCATGTGTACCATAACCTTTTTGTGCAGCTGGGAACCGATGAAAGCCTGCAGCAGAGTTTTTTCAAGCACCGCTGCGCATCCCTTTCTGCCAGTCTGAGAACCAGAATGAAGCCAGATATGGATTTAACAAGGCTCATGACGGTCTGAAAACAATAAAACTGCTAACACTGTATTTCATAGAGAGCAGACAGCCAGTTGCCTTTACAAAGCAGCCTGGAATTCTCCCGCAGATGACATCCATTACAAACGCAATAAACCAGCTTTCAGCACTGGGTGTGCACACAACGTAGATTATTACTGACAATGGTCACTATTCCGAGCAGAACTTCGCAGAACTGCTCCTTGCAGGGTTTGATTTCATCAGATTGCAAAAACCAGCGGTCAAATAAAAGGATCAGGCCAGAAATCGATAAACAACGTGAAGCGCTGGACAATTTCAAAAGCGTATGTCGTTTGATACATCTACCCATGGGGTTTCCGTTCCCTCATGAATGGATTTTCGAAGAATCACAAATATGCCAGCCATAAAAGCGGTGCGCAAAAGGGGGATGCCGAAACCTTTACATGCAGGATTTATCTCAATATATATTTTAATCATTCCGGGGCAGTCAGCAGATAAAGCCGCATTTGAGGCTGACCTGTTTGAACTTAAGACCCTGCTTGAATTTGGCACTCCTGTTGATGAATTATCAGATTTATCGCAGGCAAAGGTAAAGAAATATTTTTCCATCAAAAAATGGGGCGGCAAAACCATTGTGGTTCCCAATAACAAAGCCATCGCAGACGCGAACAAATACCACGGTTACTTTGTCCTTGTGTCCAATAAGGAAAAAGATCCTTTTGAATGCCTGCGCAAGTACCGTAAAAGGGAAACGATAGAATCCTTTTTTGAAGCCGGGAAACAGCATGCTGACGGTAACAGGGTAAGGGTATGGAATACCGACACCCTCCGCGGTCGTATGTTTGTCCAGTTTGTGTCCCTCTGCTATTATGAGTACCCGAATGAAGAAATCCGAAAACTCGAAAAAAGCCTGGGCAAGGAAAACGGCGATGCAGCACATGATACCAAAGCCGTGCTGAACAATGAATCCAAGTTAAAGTCATGGCTTTGCAATACCCCTCTGTATCTACAGCTGCAATGGTTTGATACAGTGGAAAGCGTTGGCATATCGGCAAAGCTCAAGAGCCGCAGATGGACTACGGAGATCACATCGCGCGATGAGCTTTATTTACAAAAACTTGGGGTGACTATCAGTTGATGTTTTGAGTATATATTTTACGACTTTTAGGATATTATATGATCTCTCAGCAATTTGAGTAGCCATGCCAGCAAAATTTTTGTAATGCACTGCTACGAAAATAAAGCTCGTAGCAAAGAATTGAACCTTGAAACAAATATTTTCAACAATTATGAAGGCATCACTAAAAAGCAGTCACAATATTGGTTGCCAACAGTGAAATATGGATGATATGAAGATGGACTAACCTTGGACTTCTACATGATAGCCATGTTTCTCTAATTTACGTATGTATCGGGAAAGCTATTTTTGTTCACAGCTTTTGAGTTTGGCTTCAAAACATGATTCATTGAAAAGAGTGCCTTGCTTTAACATGGTGTAAATGATTACAAGAAGTTTTCTTGCAAGGGCAATAATAGCCTTTTAGCCTCCTTTTTCTGTTTGAACTTCCAATACTAGGCGGACAGAAAGGTATTGCGTTTTCCCGCGATTACCCAGGTAATTTCGCAAAGAATACTTCTTATGTAAGGATTGCCTTTTGTGATAGATGTGCTTTTCCGTTTCCCGGCACTTTCATTATGTGCCAAGGCACAGGCCTACCCATGAGCAGATATCCTCCACAGTTTTGAACGGCTTTATATCAATGCCGATCTCAGCAATGATTACACAGGAAGCCGTTGTTCTGATTCCATAAATGTTACTTAGCTGAGATAAGAACGCTACATGCAGGAAGGAGTTCGTGGAGCAGGCTCGAATTTGGAATATTTATCAGTATAGGATAAATCAAGATTCCAAAAGCGCTCAATGATAACCTAAGTGGAATAAGCAAGAACATCTGGATTGGGATAATATTTATGAAGATTAGCAACAACAAACTTTTGGTAGTCAGAATCACTGCCGCAGTTAACAGATAACAAATAAAATCACCTATAATCGAAAAAATATATCTTACAAATAAGCTCGTATAGCACCACATTTTTCGATAGGTTTGCCAAGTGGTTTCGGCAAAAAAGTGGGGGATTTTAATAAAAAAGGAATCAGAAAGCCTTATATATACTAGTATACCGTTTTTTAATTTCTCATATACCAAGCACTCACTATTTGCATCATTGCTGCGTTGTCGTCAGTCAACGATGCCACCGCATCGTTGACTGACGACAACGCAGCAATGATGCAAATATCAAGCATTTTGTAAACGGTGTACTAGCATAAAGATTCCAAGAGATTATTTATTAAAAATGGAGGTTTAAGGATGTATCGTGAAATTTCAAAATTGATTATGTATCAAGACGTAAAACAAGAAAGTATACTTTATAAGATGGGAGAAGTATTTCATTGCTTTCAAGAAGAAAAAAGTTCAAAACAGGAACTAATTAAAAAAAATTTATATACAATTGAGAAGATTATTAGATATAGCAACTGATTATGGTTTCAATTATAATTTATGGCATAATTATTTAACGTTTTATCTTGTTACGAATGAAAATCCATATAGTCTAACATGTGAAAAAGTTGGTGATGAAGGTAGCGCAAATTATTTCGCAAAAATGATTTCCGCGTTTTTAAAAATTTATTTGACTTTGATTTTTTACCGATTGAAAAAGAGTTAGGTATTGATTGTTTTACACAAATTGCTCATTATAAGTCAATTGAAAAAAAAGAATTTATGTATAATAAAAGAGTGTGAGTGATAAAATAATATCATTAAGTAAACGGTTGGAACAGGCACTTGATGAGAATGATTTTTTCAATGAAGTAACGGCATTTTATAAAAATTATGGTGTAGGAATGTTTGGATTAAACAAAGCATTTCGAATCAAAGAACAAACGGATATGGATATGGATTTTATACCTATTAATAATATGGATAAAGTTATGCTTGATGATTTAATAGGATATGAATTGCAAAAACAAAAATTAACTGATAATACAAAAGCTTTTGTTGAAGGACGGAAAGCAAATAATGTACTTCTTTTCGGAGATAGTGGAACAGGTAAATCAACCAGTATCAAAGCAATTGTGAATGCTTTCTATCCGCAAGGATTAAGAATGATTGAAATTTACAAACACCAGTTTAAATATTTATCAAATATAATTGCACAAATAAAGAAACGAAATTATAAGTTTATCATTTATATGGATGATTTATCTTTTGAAGAATATGAAATTGAATATAAATTTTTAAAGGCGCTAATTGAAGGAGGAGTAGAAACAAAACCAGATAATATATTGATCTATGCAACATCAAACAGACGACATATTATTAAAGAAACTTGGAATGATAGAAATGATATAGAAACTGAGAAGGGAATGCATCGTTCTGATACAATGGAAGAAAAACTTTCACTTGTTAATCGTTTTGGCGTAACAATCAACTATTCAAAACCATCTCAAAAAGAATATTTTCAAATTGTTATTGCACTTGCACGAAAACAAGGAATTACATTGTCAGACGAAGAATTGTGTAAAGAGGCTAATAAATGGGAATTAAGTCATGGAGGAATATCTGGACGTACAGCTCAGCAGTTTATTAATTATTTAGATGGAAAAGTGGAAGAGAAATAATCTATTAAATGAATTAAGGAGGTTTATTTGAATTCTAAAGAGATATAATGTTTTAGATTGTATACGTGGTTTTGCTTTATTAAATATGATTGCATATCACACTATTTGGGATTTAGTTTATTTGTTTGGGATAGACTGGAAATGGTATCATTCGCAAGGTGCTTACATATGGCAACAAGGAATTTGTTGGATATTTATCTTTTTATCTGGATTTTGCTGGTCTTTAAGTAAGCATCCTTTAAAACGTGGAATTATTGTGTTTTTGTGGAGGTGCTTTAATTTCTTGTGTTACAATACTTTTAATGCCACAAAACAGGGTTTTATTTGGTGTTCTAACATTGATTGGATCCTGCATGATATTAGTAACAATTTTGAATAAAATCTTGCAAAAAATTTATCCATTGACTGGAATGTCTATTTGCTGGATATTATTTATAATAACCAGAAGTATTAATGATGGTTATCTTGGCTTCGAGAAAATTCATCTATTAAAATTACCTAAATATTTATATCAAAATCTTATTTCTTCTTACTTGGGTTTTCCGAAACAGGGATTTTATTCAACAGATTATTTTTCAATTATTCCATGGTTTTTTTTATTTTTAAGTGGATATTATTTGTTTCACTATATGGAAGAAAAGAATAACCTTAATAAATCTGGCCCAAAAAGATTTTTCTGCATTATCGTGGTTTGGCAAACATTCTTTCATTTTATATATGATTCATCAGCCAATAATTTATATTATTTTGGAATTGATATGGAAGATATACAAATATCAGTTTAGATAAAAGACTCAACTAGCAGTAGAGTTTTTTATCTAAATTCTAAAAAACTGTTATACAAAAAATATAAAATTATATTAAAATAAAGTTATAATATTATTTGTTGGTTAAATAAAAGTAGGGAGGATGAATATGTTAGATAATAAAAAAATTGCGGATTTTTTTAATGAAGAAACGAAAAGAATTAGGTTATACACAGGCAGAAATTGCTCAAAAATTAAATGTATCTTTTCAGGCAGTTTCAAAATGGGAAAACGGGACACTTCCCAATATTGAAATTTTAGTAGATTTAGCAAAATTACTTAAAGTTTCTGTAAATGAAATTCTGGTGGGCAGGGAATTAAATGAAGAAAATTTTTCTTATCGAAAAGCAGGTGTAGATAATCTTATACAGATATTATAAAAAAAGAAATGTCCATTTATTTAAAATCTGATAATCAAAGAGTATTAA

ythuang0522 commented 3 years ago

This one looks fine at my test. python3 homopolish.py polish -m R10.3.pkl -s /var/www/www/mash_sketches/bacteria.msh.gz -a contig100.fa -o racon_4x_medaka_homopolish

The output folder did contain the original contig, although no related genomes were found and polished. more racon_4x_medaka_homopolish/contig100_homopolished.fasta >contig_100 GTTTATAGGCAATAATCAATCACGTTATATATAGAATGTCAAATAATAGTCATTATAAATAGTCATTGAATTAAAAAGAATAACTATACAAAAAATAAATTTGATGTATAATAATACATA...

Can you help also polish this contig at your side and see if it works?

DDavila10 commented 3 years ago

Hi Yao-Ting - Thanks a lot for the prompt response. I tried that test with the same contig and works. The output directory only contains the polished contig, no further information. How do I know that a related genome was or not found? This means that the output contig file is the same as the original because there are no related genome to use for polishing?

J-I-P commented 3 years ago

You can use argument -d to keep the information of every contig after mash, such as homologous sequences and its identity information.

ythuang0522 commented 3 years ago

Yes. The related genomes can be displayed via -d. But your MAGs are too short (N50=10kb) to find any related genome at current version. You have to wait for next release.

DDavila10 commented 3 years ago

Hi Yao-Ting and Nicole, Thanks a lot for your valuable help and time. I am looking forward for the next version of homopolish! :)