philliplab / EpitopeMatcher

A package that can be used to find out how well the epitopes in a patient's virus' will be recognized by the HLA's present in the patient.
1 stars 2 forks source link

EpitopeMatcher

A package that can be used to find out how well the epitopes in a patient's virus' will be recognized by the HLA's present in the patient.

There are two ways to install EpitopeMatcher:

Installation using podman/docker

Install podman on your computer: https://podman.io/getting-started/installation

Optionally follow rootless mode instructions if you are a root user and want regular users of your system to be able to run EpitopeMatcher environments securely on their own.

Note: Podman is completely optional if you already have Docker installed. Podman will however take presidence if both are installed.

Clone the EpitopeMatcher repo:

git clone https://github.com/philliplab/EpitopeMatcher

Use the EpitopeMatcher (etm) script to build the container and serve the shiny app:

cd EpitopeMatcher
./etm -h
./etm build
./etm serve

Big thanks to Dean Kayton (https://github.com/dnk8n) for contributing the container file and etm script.

Installation Instructions for Ubuntu

Make sure you have a recent version of R. Follow the instructions in the following link to set up the correct repositiory for apt: http://stackoverflow.com/questions/10476713/how-to-upgrade-r-in-ubuntu.

Make sure that both r-base and r-base-dev is installed

sudo apt-get install r-base r-base-dev

Next, install devtools' depedancies with apt-get:

sudo apt-get install libssl-dev libxml2-dev libcurl4-gnutls-dev

Then, from within R, install devtools and the BioConductor dependencies:

install.packages('devtools', repo = 'http://cran.rstudio.com/')
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Biostrings")

Finally install the latest version of shiny and then EpitopeMatcher:

library(devtools)
install_github('rstudio/shiny')
install_github('philliplab/EpitopeMatcher')

Using EpitopeMatcher

To run the web UI:

library(EpitopeMatcher)
run_EpitopeMatcher_app()

To get some test data:

library(EpitopeMatcher)
get_set_of_test_data()

or download it from Test Data

The test data consists of 3 sample files:

To use EpitopeMatcher in an R session, see the help file of these functions:

Docker not available right now: Alternatively it can also be obtained using docker:

Design Notes

Outline showing execution order

match_epitopes()
    list_scores_to_compute()
    score_all_epitopes()
    output_results()

list_scores_to_compute()
    matched_patients = match_patient_hla_to_query_alignment()
  flat_lanl_hla = flatten_lanl_hla()
    build_scoring_jobs(matched_patients, matched_hlas)

build_scoring_jobs(matched_patients, lanl_hla_data)
  jobs = NULL
  for (mp in matched_patients)
    hla_details = get_hla_details(mp$..., lanl_hla_data)
    jobs = c(jobs,
             .Scoring_Job(hla_genotype,
                          patients,
                          hla_details))

score_all_epitopes()
    for (job in …)
        score_epitope()

score_epitope()
    find_epitope_in_ref()
    if not found()
        log_epitope_not_found()
    if found()
        get_query_sequences()
        align_ref_epitope_to_query_seqs()
        log_epitope_found()

Design Choices

  1. The input data is named and used in this order:
    • query_alignment
    • patient_hla
    • lanl_hla
  2. The way to refer to a query sequence is by it's full FASTA header. Not the patient_id extracted from it nor it's position (index) in the alignment.
  3. Error Logging. Probably not the best design, but it should be good enough. Let each function that should log errors return as output a list with elements: 'msg', 'result', and 'error_logs' where 'error_logs' is again a list each of whom's elements is a data.frame that logs a specific type of error. This design should allow the users to inspect the error logs in EXCEL quite comfortably. A better design might be to produce traditional logs using a standard logging library and then to process those logs at a later stage in easy to analyze formats, but in the short term this is more work.