This script can be used to run coalHMM given a specific set of arguments. It performs the filtering of the input maf file, such that a temporary filtered maf file is generated. This filtered maf file will only contain those species that were specified in the script call. Using the filtered maf file, autocoalhmm.py also divides the alignment into roughly 1 Mb windows and performs coalHMM on them. Finally, it also collects all the results and saves them into a user-friendly HDF5 table with the coordinates of the maf file.
Note that the workflow will only run with slurm backends. In order to run it, manual modification of the gwf workflows might be necessary.
IMPORTANT: when cloning the GitHub repo, the permissions for all files need to be
changed. You can do so by running chmod -R 777 ./
.
The way autocoalhmm.py is invoked:
python autocoalhmm.py sp1 sp2 sp3 sp4 target_seqname maf_path
If the 1-species unclock model needs to be run, then use:
python autocoalhmm.py sp1 sp2 sp3 sp4 target_seqname maf_path error_sp1
If the 2-species unclock model needs to be run, then use:
python autocoalhmm.py sp1 sp2 sp3 sp4 target_seqname maf_path error_sp1 error_sp2
Where:
sp1
, sp2
and sp3
are the species of the analyzed branch.sp4
is the outrgroup species.target_seqname
is the reference sequence, in the form of species.chr
.maf_path
is the path to the unfiltered maf file. The workflow steps are executed as follows:
Depending on whether target_seqname is part of the three species in the trio + outrgoup or not, it will behave differently: