jkimlab / PAPipe

29 stars 1 forks source link

PAPipe: a comprehensive pipeline for population genetic analysis

Main workflow

  1. Read trimming (by Trim Galore)
  2. Read mapping (by BWA or Bowtie 2)
  3. Genetic variant calling (by GATK3, GATK4 or BCFtools)
  4. Data filtering and format converting (by PLINK v 1.9)
  5. Population genetic analyses
    • Principal component analysis (by GCTA or PLINK v 2.0)
    • Phylogenetic tree analysis (by SNPhylo)
    • Population tree analysis (by TreeMix)
    • Population structure analysis (by ADMIXTURE)
    • Linkage disequilibrium decay analysis (by PopLDdecay)
    • Selective sweep analysis (by SweepFinder2)
    • Population admixture analysis (by AdmixTools)
    • Pairwise sequentially Markovian coalescent analysis (by psmc)
    • Multiple sequentially Markovian coalescent analysis (by msmc2)
    • Fixation index analysis (by VCFtools)

Install a Docker Engine (Need root permission)

Skip if your machine already has the engine (Installation document).

curl -fsSL https://get.docker.com/ | sudo sh

Add a Docker user to the docker group (Need root permission)

Skip if your account is already added in the docker group

sudo usermod -aG docker $USER   

Install the PAPipe Docker image

wget http://bioinfo.konkuk.ac.kr/PAPipe/PAPipe.tar.gz    # Download the Docker image file
docker load -i ./PAPipe.tar.gz    # Load the Docker image file
docker image ls    # Check if the image loaded well ("REPOSITORY:pap_docker, TAG:latest" must be shown)

Run PAPipe

Setting local input directories (Caution: do not change the names and the directory structure)

mkdir RUN_DOCKER/
cd RUN_DOCKER/

mkdir data/
cd data/

mkdir ref/
mkdir input/

Preparing parameter files

PAPipe requires the following three parameter files

The above three files must be placed in the above "RUN_DOCKER" directory.

You can easily generate the parameter files using our parameter file genetator.

Check out more details about the parameter file generator here.

Creating a Docker container that mounts the above "RUN_DOCKER" directory

docker run -v [absolute path of the "RUN_DOCKER" directory]:/RUN_DOCKER/  -it pap_docker:latest

Running PAPipe inside the Docker container

# Run in the docker container
cd /RUN_DOCKER/
python3 /PAPipe/bin/main.py  -P ./main_param.txt  -I ./main_input.txt -A ./main_sample.txt &> ./log

Analysis results will be generated in the output directory specified in the "main_param.txt" file.

Check out more details about the analysis results here.

Generating HTML pages for browsing analysis results

You can check all analysis results in the output directory specified in the "main_param.txt" file.

However, PAPipe also supports the generation of HTML pages for easily browsing the analysis results.

# Run in the docker container
perl /PAPipe/bin/html/webEnvSet.pl ./out &> webenvset.log    # Suppose "out" is the output directory set in the "main_param.txt" file
cd ./out/web/
perl /PAPipe/bin/html/prep_html.pl ./ &> ./webgen.log

After successfully running the above commands, follow the two steps below to open the HTML pages.

If your machine supports a graphic user interface, you can directly go into the "web" directory and open the "index.html" file without downloading the "web" directory into your personal computer.

Check out more details about the generated HTML pages here.

Check out the HTML pages generated by the following test data here.

Run PAPipe with a test data

You can test PAPipe using a small test data. Check out more details here.