This plot can be used to look at the distribution of SNVs within each cluster returned by a SRC tool. Plotting the SNVs per cluster is helpful to assess the validity of clusters. We can see if there is any positional bias and the frequency of SNVs at certain genomic regions.
The code to generate these plots is here.
I used results from the SRC tool DPClust. When I wrote this code, I only had three samples, so I processed each sample separately using functions.
The first input is a tab-delimited .txt file containing the following information and column names:
1st column
2nd column
3rd column
chromosome (chr)
SNV position for that chromosome (pos)
cluster (cluster)
The second input is a tab-delimited .txt file with the following information and column names:
1st column
2nd column
3rd column
4th column
chromosome (chr)
SNV position for that chromosome (pos)
cluster (cluster)
position along the whole genome (genomePOS)
This file was generated from a previous step and saved, but this logic is used to generate the genomic positions:
where vcf.df is a dataframe of SNVs (must have chromosome and position information) and chrStart are the starting positions along the genome of each chromosome. Code to obtain these start positions are included in the script linked above.
This plot can be used to look at the distribution of SNVs within each cluster returned by a SRC tool. Plotting the SNVs per cluster is helpful to assess the validity of clusters. We can see if there is any positional bias and the frequency of SNVs at certain genomic regions.
The code to generate these plots is here. I used results from the SRC tool DPClust. When I wrote this code, I only had three samples, so I processed each sample separately using functions.
.txt
file containing the following information and column names:chr
)pos
)cluster
).txt
file with the following information and column names:chr
)pos
)cluster
)genomePOS
)This file was generated from a previous step and saved, but this logic is used to generate the genomic positions:
where
vcf.df
is a dataframe of SNVs (must have chromosome and position information) andchrStart
are the starting positions along the genome of each chromosome. Code to obtain these start positions are included in the script linked above.