Closed actledge closed 5 months ago
Hello, I think that if you use gembase convention (as defined here : https://macsyfinder.readthedocs.io/en/latest/user_guide/gembase_convention.html) you can input entire metagenome at once, and defensefinder (relying on macsyfinder) will handle the different contigs independently.
This issue has been inactive for 60 days and is now marked as stale. It will be closed in 7 days without further activity.
Hi,
Since Defensefinder takes gene order into consideration during runtime, annotating protein sequences based on a draft genome of a bacterium or MAGs of a metagenome may face the issue of contigs not being arranged in sequential order. Although, it seems doesn't happen easily that the protein sequences at the boundaries of adjacent contigs (in fasta file) would form a complete defense system. However, when dealing with a large number of draft genomes or MAGs (more fragmented), I am unsure if this will result in a certain percentage of erroneous annotations.
It is most convenient for me to run each draft genome and MAG individually. However, it seems that the term "replicon" refers to "a sequence (e.g., each contig in a fasta file) whose protein order can be determined,". If each single fasta sequence has to be treated as an independent replicon for protein annotation, it may introduce cumbersome preprocessing and post-processing steps when running in a large set of genomes. Of course, if this is indeed necessary, I will have no choice but to proceed accordingly.
In conclusion, I would like to inquire with the author about the recommended approach for handling large batches of draft genomes and MAGs.
Thanks!