ma-compbio / Phylo-HMRF

MIT License
15 stars 3 forks source link
3d-genome comparative-genomics gaussian-process machine-learning

Phylo-HMRF

Phylogenetic Hidden Markov Random Field model

The command to use Phylp-HMRF for evolutionary state estimation is as follows.

python phylo_hmrf.py [Options]

The options:

Example:

python phylo_hmrf.py -n 20 -r 1 --reload 0 --chromvec 21,22 --miter 100 (using Phylo-HMRF to estimate 20 states on syntenic regions on chromosome 21 and chromosome 22 jointly)

The input files include: edge.1.txt, branch_length.1.txt, species_name.1.txt, chromosomeID.synteny.txt, path_list.txt, chromosome size file of the reference genome, and the aligned Hi-C contact files of studied species. Please follow the descriptions of the input files to prepare the input files for your own study. Please keep the input files in the directory specified by the argument '-p' (or '--root_path'). The directory of the input data files are set to be the current working directory by default. For the current version of Phylo-HMRF, please use the same file names as shown in the descriptions. We provide some example input files in the file folder example_input. Please see the examples for the input format.

Please see outputfile_description.txt for the descriptions of the output file. We also provide MATLAB code that could be used to extract the state estimation results from the output file and visualize the estimated states in the Hi-C contact map of each synteny block as a color image. Please see the code in the file folder processing.

Please comment or modify Line 385-390 in utility.py according to the species studied. In our study, we divided the large-size synteny regions on chr3 and chr6 of genome hg38 according to the chromosome arms, respectively. However, this only applies to genome hg38. If the reference genome is different, Line 385-390 need to be changed accordingly.

The first version of Phylo-HMRF is contained in the file folder phylo_hmrf_v1, which can be applied to state estimation on single chromosomes.


Required pre-installed packages

Phylo-HMRF requires the following packages to be installed:

You could install the Anaconda (avilable from https://www.continuum.io/downloads) for convenience, which provides a open souce collection of widely used data science packages including Python and NumPy.


If you would like to cite Phylo-HMRF, please consider the following citation:

Yang Yang, Yang Zhang, Bing Ren, Jesse R. Dixon, Jian Ma. (2019). "Comparing 3D genome organization in multiple species using Phylo-HMRF". Cell Systems 8(6):494-505.e14.