Add a plot of distribution of SNVs per cluster for quality control

This plot can be used to look at the distribution of SNVs within each cluster returned by a SRC tool. Plotting the SNVs per cluster is helpful to assess the validity of clusters. We can see if there is any positional bias and the frequency of SNVs at certain genomic regions.

STGHKGFH000001_multipanelplot_SNVs_genome_distribution_per_cluster

The code to generate these plots is here. I used results from the SRC tool DPClust. When I wrote this code, I only had three samples, so I processed each sample separately using functions.

The first input is a tab-delimited `.txt` file containing the following information and column names:	1st column	2nd column	3rd column
chromosome (`chr`)	SNV position for that chromosome (`pos`)	cluster (`cluster`)

The second input is a tab-delimited `.txt` file with the following information and column names:	1st column	2nd column	3rd column	4th column
chromosome (`chr`)	SNV position for that chromosome (`pos`)	cluster (`cluster`)	position along the whole genome (`genomePOS`)

This file was generated from a previous step and saved, but this logic is used to generate the genomic positions:

where vcf.df is a dataframe of SNVs (must have chromosome and position information) and chrStart are the starting positions along the genome of each chromosome. Code to obtain these start positions are included in the script linked above.

uclahs-cds / package-CancerEvolutionVisualization

Add a plot of distribution of SNVs per cluster for quality control #88