wxli0 / MT-MAG

2 stars 0 forks source link

Requirements

(1) Python >= 3.7.9, in addition to the standard packages in anaconda3, and the following packages are required:

(2) Matlab

(3) grep >= 3.1

Installation

git clone https://github.com/wxli0/MLDSP.git

git clone https://github.com/wxli0/MT-MAG.git

Modify the paths in MT-MAG/config.py if MT-MAG and/or MLDSP are not cloned in the root directory.

Tasks

The Tasks that we present in the paper are:

Data preparation for Task 1 (sparse) and Task 2 (dense)

If you want to prepare data explictly, not using the pipeline in the following section, use the following commands

cd MLDSP/data/preprocess

Or you can download datasets directly at MT-MAG-data

Note that the dataset for Task 2 (dense) is too large to be stored in one zip, after unzipping order_family_genus_rumen.zip and root_domain_phylum_class.zip, you need to put them into one folder, as the unzipped folder for Task 1 (sparse).

MT-MAG commands to run existing tasks

cd MT-MAG

screen -S new

In a json file in task_metadata/, five mandatory attributes and two optional attributes are specified:

To run a small example

The test dataset is at d__Archaea.zip. You need to download, unzip this file, and put it into base_path/test_dir/d__Archaea.

To run Task 1 : simulated/sparse

To run Task 2: real/dense dataset

After "python exec_entire_process.py" command, "bash phase.sh -s …" will be running in another screen session. For example, for Task 1 (sparse), the first classification is the root taxon (root_taxon) to Phylum level classification. When it finishes, it will trigger Phylum-to-Class level classifications, followed by Class-to-Order, Order-to-Family, Family-to-Genus, Genus-to-Species level classifications. The program terminates when missing_ranks is empty. In the meantime, you should monitor if any screen session run into memory issues. The basic commands to check screen sessions are:

(1) To find the screen session ID: screen -ls

(2) Attach to the screen: screen -d -r [screen ID]

Citation

Please cite our work if you find it useful.

@article{li2023mt,
  title={MT-MAG: Accurate and interpretable machine learning for complete or partial taxonomic assignments of metagenomeassembled genomes},
  author={Li, Wanxin and Kari, Lila and Yu, Yaoliang and Hug, Laura A},
  journal={PLoS ONE},
  volume={18},
  number={8},
  pages={e0283536},
  year={2023},
  publisher={Public Library of Science San Francisco, CA USA}
}