The AlloPipe tool is a computational workflow which imputes
(i) directional amino acid mismatches and their related
(ii) minor histocompatibility antigens NetMHCpan softwares
within a pair of annotated human genomic datasets.
Be careful with the terms of use of NetMHCpan
The AlloPipe tool is divided into two modules: (i) Allo-Count and (ii) Allo-Affinity
(i) Allo-Count imputes the directional amino acid mismatches
Allo-Count reformats the relevant data from the VEP-annotated .VCF file(s), performs a stringent data cleaning and computes the directional comparison of the sample amino acid sequences. Allo-Count returns:
Direction of the mismatch
The sample comparison is directional and accounts for either the amino acids that are present in the donor but absent in the recipient (donor-to-recipient) or that are present in the recipient but absent in the donor (recipient-to-donor).\ Donor-to-recipient count is designed to study polymorphisms that the recipient’s immune system recognises as ‘non-self’, as in solid organ transplantation.\ Recipient-to-donor count is designed toward detecting polymorphisms that the donor’s immune system recognises as ‘non-self’ once engrafted in the recipient, as in allogeneic haematopoietic stem cell transplantation.
(ii) Allo-Affinity imputes the candidates minor histocompatibility antigens
Allo-Affinity reconstructs peptides of requested length around the amino acid changes, then returns their affinity towards HLA molecules using NetMHCpan softwares. Allo-Affinity returns:
4-digits HLA typing has to be provided by the user for the HLA molecules of interest.
There are two modes of operation for each module: (i) single pair or (ii) multiple pairs
Single pair\ Run as 'single pair mode' if you aim to compute AMS and/or af-AMF for one pair at a time. \ You need to provide one VEP-annotated .VCF file per individual.
Multiple pairs
Run as 'multiple pairs mode' if you aim to compute AMS and/or af-AMF for more than one pair at a time.\
You need to provide one unique VEP-annotated .VCF file containing the genotype of all individuals you want to analyse - i.e. a merged .VCF file - and the .csv list of the pairs you want to process.
AlloPipe specifically requires
Python >=3.6 (developed on 3.9)
Conda installed in the suitable version for your operating system and python version, as we recommend to install the dependencies needed to run AlloPipe in a dedicated conda environment.
NetMHCpan and NetMHCIIpan downloaded as command line tools.\ Make sure you use NetMHCpan in accordance with their user licence.
To download and install the AlloPipe workflow, first clone the repository from git.\ You might be requested to create a token for you to log in. See the GitHub tutorial
We then recommend to create a conda environment dedicated to the AlloPipe workflow. The dependencies specified in the requirements.txt are needed for AlloPipe to run and should be installed in this AlloPipe environment.
The following command lines will perform the above-mentioned steps:
git clone https://github.com/huguesrichard/Allopipe.git
cd Allopipe
conda create --name Allopipe python=3.9
conda activate Allopipe
python -m pip install -r requirements.txt
AlloPipe input file(s) must be VEP-annotated .VCF files. Other annotation tools could theoritically be used after code adjustments.
You will then also need a VEP annotation tool prior the use of AlloPipe. AlloPipe has been developed and tested with .VCF files annotated with v104, v110 and v111. We recommend to use the most recent version of VEP unless it leads to major changes in the architecture of the output .VCF files.
VEP annotation: On-line or command line installation\ VEP annotation can be done using the online tool or by downloading the command line tool.
To use the web interface, follow this link.
To install the command line tool, follow the installation tutorial available here.\ During the installation, you will be asked if you want to download cache files, FASTA files and plugins.
- We recommend to download the cache files for the assembly of your VCF files to be able to run VEP offline.\ Download the VEP cache files which correspond to your Ensembl VEP installation and genome reference!
- We recommend to download the FASTA files for the assembly of your VCF files to be able to run VEP offline.\ Download the FASTA files which correspond to your Ensembl VEP installation and genome reference!
- We don't recommend to download any plugin
We then recommend to add VEP to your PATH by adding the following line to your
~/.profile
or~/.bash_profile
:
export PATH=%%PATH/TO/VEP%%:${PATH}
If you are on Windows, you can follow this tutorial to add VEP to your PATH.
For complete insights on VEP, see VEP
Run the following command to annotate you VCF file(s) with VEP.\ All specified options are mandatory, with the exception of the assembly if you only downloaded one cache file.
vep --fork 4 --cache --assembly <GRChXX> --offline --af_gnomade -i <PATH-TO-FILE-TO-ANNOTATE/FILE-TO-ANNOTATE>.vcf -o <PATH-TO-ANNOTATED-FILE/ANNOTATED-FILE>.vcf --coding_only --pick_allele --use_given_ref --vcf
Where:\
<GRChXX>
is the version of the genome used to align the sequences.\
<PATH-TO-FILE-TO-ANNOTATE/FILE-TO-ANNOTATE>.vcf
is the path to your file to annotate.\
<PATH-TO-ANNOTATED-FILE/ANNOTATED-FILE>
is the path to the directory and the name of the ouput annotated file.\
This command line works for individual .VCF files or multi-VCF files, whether compressed (.gvcf) or not (.vcf). Run this command for every file you want to input in AlloPipe.
Once the VEP annotation of your file(s) is(are) complete, you are now ready to launch your first AlloPipe run!
What does Allo-Count perform?
From variant annotated .VCF file(s), variants are first reformated then filtered considering a set of quality metrics (defaults values):
The curated .VCF file(s) is(are) then queried for the amino acid information to assess the directional amino acid mismatches between samples.
Direction of the mismatch
The sample comparison is directional and accounts for either the amino acids that are present in the donor but absent in the recipient (donor-to-recipient) or that are present in the recipient but absent in the donor (recipient-to-donor).\ Donor-to-recipient count is designed to study polymorphisms that the recipient’s immune system recognises as ‘non-self’, as in solid organ transplantation.\ Recipient-to-donor count is designed toward detecting polymorphisms that the donor’s immune system recognises as ‘non-self’ once engrafted in the recipient, as in allogeneic haematopoietic stem cell transplantation.
counts either the amino acids that are present in the donor but absent in the recipient (donor-to-recipient, dr) or the other way around (recipient-to-donor: present in the recipient but absent in the donor, rd).
Once the VEP annotation is complete, go to the root of the AlloPipe directory to run the following commands in the terminal (don't forget to activate your conda environment!) :
cd src/
python ams_pipeline.py -f -n <NAME-RUN> -p <NAME-OF-THE-PAIR> <PATH-TO-DONOR-ANNOTATED-FILE/ANNOTATED-FILE>.vcf <PATH-TO-RECIPIENT-ANNOTATED-FILE/ANNOTATED-FILE>.vcf <DIRECTION OF THE MISMATCH>
Where :\
<NAME-RUN>
is the name of the run\
<NAME-OF-THE-PAIR>
is the name of the pair\
<PATH-TO-DONOR-ANNOTATED-FILE/ANNOTATED-FILE>.vcf
is the path to the donor's annotated VCF \
<PATH-TO-RECIPIENT-ANNOTATED-FILE/ANNOTATED-FILE>.vcf
is the path to the recipient's annotated VCF \
<DIRECTION OF THE MISMATCH>
= 'rd' or 'dr', depending on the direction of the mismatch
A complete helper function is provided
python ams_pipeline.py --help
It is possible to launch Allo-Count for each pair of
cd src/
python multiprocess_ams.py -n <NAME-RUN> <PATH-TO-THE-MERGED-ANNOTATED-FILE>.vcf <PATH-TO-THE-PAIR-LIST>.csv <DIRECTION OF THE MISMATCH>
Where:\
<NAME-RUN>
is the name of the run\
<PATH-TO-THE-MERGED-ANNOTATED-FILE>.vcf
is the path to the annotated merged VCF file\
<PATH-TO-THE-PAIR-LIST>.csv
is the path to the list pairing the sample (template provided in the tutorial)\
<DIRECTION OF THE MISMATCH>
is the direction of the mismatch as previously described
It is not possible to run different mismatches within the same command line.
We provide a complete helper function
python multiprocess_ams.py --help
Normalisation
To avoid artefacts related to the quality of the sequencing that might lead to AMS lower or higher than expected, we provide to the user the ref/commun ratio.
After the run is complete, have look at the output/runs/
The directory is structured as followed :
In the run_tables/ directory, you can find the mismatches table that will give you direct information on the mismatched positions.
In this table, you can find the following information :
VCF information
Sample information
VEP information
AlloPipe information
Allo-Affinity generates a set of candidate minor histocompatibility antigens around each previously assessed directional amino acid mismatches using sliding window. The user defines the length of the potentially HLA-embedded peptides, usually 9-mers for HLA class I and 15-mers for HLA class II molecules. The affinity values are computed using NetMHCpan4.1 and NETMHCIIpan4.3, respectively.
What does Allo-Affinity perform?
From previously generated files that are the TABLE-MISMATCH and the TRANSCRIPT-TABLE, Allo-Affinity reconstructs the set of peptides that are different between the donor and the recipient.
The directionality of the mismatch is kept, meaning that if Allo-Count has been run within the donor-to-recipient direction, only peptides present by the donor but absent from the recipient will be reconstructed.\ In the same way, if Allo-Count has been run within the recipient-to-donor direction, only peptides present by the recipient but absent from the donor will be reconstructed.
Allo-Affinity prepares the files that are required by NetMHCpan4.1 and NETMHCIIpan4.3 to finally impute the affinity of those reconstructed peptides towards the HLA peptide grooves.
Please note that the HLA typing has to be known before running the command line, as the AlloPipe tool does not impute the HLA typing from genomic data.
Once the AMS run is complete, go back to the AlloPipe root directory and run this second set of commands:
cd src/
gzip -d <PATH-TO-GENOME-REFERENCE.cdna.all.VEP-VERSION>.fa.gz
gzip -d <PATH-TO-GENOME-REFERENCE.pep.VEP-VERSION>.fa.gz
gzip -d <PATH-TO-GENOME-REFERENCE.VEP-VERSION.refseq>.tsv.gz
python aams_pipeline.py -M <PATH-TO-MISMATCH-TABLE>.tsv \
-T <PATH-TO-TRANSCRIPT-TABLE>.tsv\
-E <PATH-TO-GENOME-REFERENCE.cdna.all.VEP-VERSION>.fa.gz \
-P <PATH-TO-GENOME-REFERENCE.pep.VEP-VERSION>.fa.gz \
-R <PATH-TO-GENOME-REFERENCE.VEP-VERSION.refseq>.tsv.gz \
-n <TEST-RUN> -p <TEST-PAIR> -l <LENGTH-OF-PEPTIDES-TO-BE-RECONSTRUCTED> --el_rank <THRESHOLD-FOR-EL> \
-a <HLA-TYPING>
Where:\
<PATH-TO-GENOME-REFERENCE.cdna.all.VEP-VERSION>.fa.gz
is the path to\
<PATH-TO-GENOME-REFERENCE.pep.VEP-VERSION>.fa.gz
is the path to\
<PATH-TO-GENOME-REFERENCE.VEP-VERSION.refseq>.tsv.gz
is the path to\
<PATH-TO-MISMATCH-TABLE>.tsv
is the path to the mismatch table generated by Allo-Count\
<PATH-TO-TRANSCRIPT-TABLE>.tsv
is the path to the transcript table generated by Allo-Count\
<TEST-RUN>
is the name of the run\
<TEST-PAIR>
is the name of the pair \
<LENGTH-OF-PEPTIDES-TO-BE-RECONSTRUCTED>
is the length of peptided to be imputed \
<HLA-TYPING>
is the HLA typing e.g. HLA-A01:01,HLA-A02:01,HLA-B08:01,HLA-B27:05,HLA-C01:02,HLA-C07:01
To be implemeted
This second step of AlloPipe uses the AMS information of the first step.
You will find 3 new subdirectories in the test_run/ directory :
The AAMS value obtained with VEP v107 and netMHCpan4.1 is 34.
If you want more in-depth information on the mismatches contributing to the AAMS, you will find a mismatches table in the aams_run_tables/ directory.
It contains the mismatches information from the AMS run along with information provided by netMHCpan :
You can now get started with your files, check the documentation if you want more control over the filters that we implemented.
We provide a couple of example data in /tutorial, i.e. tutorial/donor_to_annotate.vcf and tutorial/recipient_to_annotate.vcf (those files correspond to human chr6).\ To test your VEP installation, run the following command at the root of the AlloPipe directory :
vep --fork 4 --cache --assembly GRCh38 --offline --af_gnomade -i tutorial/donor_to_annotate.vcf -o tutorial/donor_annotated_VEP.vcf --vcf
vep --fork 4 --cache --assembly GRCh38 --offline --af_gnomade -i tutorial/recipient_to_annotate.vcf -o tutorial/recipient_annotated_VEP.vcf --vcf
Once the VEP annotation is complete, go to the root of the AlloPipe directory to run the following commands in the terminal :
cd src/
python ams_pipeline.py -f -n test_run -p test_pair ../tutorial/donor_annotated_VEP.vcf ../tutorial/recipient_annotated_VEP.vcf rd
If your AMS returns 44, congrats ! You successfully generated your first Allogenomic Mismatch Score (AMS) and related tables !
Finally, to get your af-AMS and related table, run:
gzip -d ../tutorial/Ensembl/Homo_sapiens.GRCh38.cdna.all.103.fa.gz
gzip -d ../tutorial/Ensembl/Homo_sapiens.GRCh38.pep.all.103.fa.gz
gzip -d ../tutorial/Ensembl/Homo_sapiens.GRCh38.103.refseq.tsv.gz
python aams_pipeline.py \
-M ../output/runs/test_run/run_tables/test_pair_test_run_mismatches_20_400_5_gq_20_0.8_bl_3.tsv \
-T ../output/runs/test_run/run_tables/test_pair_test_run_transcripts_pair_codons_20_400_5_gq_20_0.8_bl_3.tsv \
-E ../tutorial/Ensembl/Homo_sapiens.GRCh38.cdna.all.103.fa \
-P ../tutorial/Ensembl/Homo_sapiens.GRCh38.pep.all.103.fa \
-R ../tutorial/Ensembl/Homo_sapiens.GRCh38.103.refseq.tsv \
-n test_run -p test_pair \
-l 9 --el_rank 2 \
-a HLA-A*01:01,HLA-A*02:01,HLA-B*08:01,HLA-B*27:05,HLA-C*01:02,HLA-C*07:01
Ir your af-AMS returns 33, you are all set !
You can now enjoy AlloPipe. We will be happy of any feedback !