phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
125 stars 33 forks source link

predicted_host_range_overall_name order #72

Closed haruosuz closed 2 years ago

haruosuz commented 4 years ago

Dear developers,

For plasmids with predicted_host_range_overall_rank of "multi-phylla", there are predicted_host_range_overall_name with the same phyla but different order.

Actinobacteria,Bacteroidetes,Proteobacteria,Firmicutes 
Firmicutes,Bacteroidetes,Proteobacteria,Actinobacteria 
Firmicutes,Proteobacteria,Actinobacteria,Bacteroidetes 
Proteobacteria,Firmicutes,Bacteroidetes,Actinobacteria 

I wonder if the order of names has any meaning, otherwise it should be sorted alphabetically?

Actinobacteria,Bacteroidetes,Firmicutes,Proteobacteria
kbessonov1984 commented 4 years ago

Thank you for suggestion of this feature. I like the idea that multiple ranks ordered alphabetically for better readability.

jrober84 commented 4 years ago

This is also a bug in that it should collapse any duplicate names and we can fix that in a later release

haruosuz commented 4 years ago

Dear developers,

I would like to use MOB-typer for partial sequence (e.g. https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP013164.1 ).

I wonder if MOB-typer can report predicted_host_range based on "replicon" and/or "relaxase" as shown in Fig. 6 (https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000435#F6)?

kbessonov1984 commented 4 years ago

@haruosuz You might want to try MOB-typer versions 2.0.5 and 2.1.0 report number of hits (i.e. plasmids) per each taxonomy level stored in *_refseqhostrange_phylostats.txt file. We removed this extended taxonomy functionality in version 3 due to some technical issues with ete3 library and due to minimalistic philosophy. If there is enough interest, in next releases we might add back a more detailed taxonomy output similar to v2.1.0.

To generate similar results to the manuscript Figure 6, additional coding will be required to generate a stack plot. In these MOB-Suite versions it is possible to directly query host range databases via mob_hostrange module based on replicon name, relaxase name, plasmid cluster id and relaxase accession: --replicon_name, --relaxase_name, --relaxase_accession, --cluster_id.

Let us know if it works. You might run into issue of installing amos package, in this case install from source directly. This package could be omitted if you do not run --run_circlator parameter in mob_recon

haruosuz commented 4 years ago

@kbessonov1984 It would be great if you could add back a more detailed taxonomy output of predicted_host_range in next releases.

I would like to check the results of predicted_host_range for partial sequence and shotgun metagenome data, based on replicon or relaxase biomarkers or MOB-suite cluster.

jrober84 commented 2 years ago

v. 3.1.0 addresses the issue with multiple phylla not being sorted or deduplicated.