matthiaszeller / HBV

Genome-to-Genome Study of Hepatitis B Virus-infected individuals; EPFL, Fellay lab
3 stars 0 forks source link
clinical-data data-science g2g genome genomics hbv hepatitis notebook

HBV GWS project

About

This repository is the result of my bachelor project as a three-week long internship in Fellay Lab, EPFL under the supervision of Sina Rüeger. This project aims to perform a genome-to-genome (G2G) study of HBV-infected individuals from different populations, i.e. find associations between host single nucleotide polymorphisms (SNPs) and viral amino acid variants. The project eventually focuses on a genetically-related asian subpopulation.

Requirements

Project structure

The analyses is performed directly inside notebooks, and some of them store processed data. Thus the order of the notebooks (see below) matters. One can convert the notebooks to PDF with nbconvert, optionally with --execute to re-run the notebooks.

Data desription & format

Data processing & analysis

All computations are performed in notebooks, which one has to run in the following order:

  1. Clinical data notebook: process the clinical data from the csv file. Stores a DataFrame binary object.
  2. Viral data notebook: process viral data from a csv. Stores a processed DataFrame in a binary file.
  3. Joint viral and clinical data notebook: combine the two datasets. PCA colored with genotypes
  4. Host genotype data preparation notebook: quality control, application of filters
  5. Host genotype data analysis notebook: PCA, association analyses, clustering
  6. G2G of asian subpopulation: prepare new dataset, try monovariate models, implement multivariate models
  7. G2G computer: multivariate models computation, analysis of results
  8. Interpretation of results: extract and analyse significant associations

Documentation & tutorials

Useful resources and references: