uclahs-cds / package-CancerEvolutionVisualization

Publication Quality Phylogenetic Tree Plots
https://cran.r-project.org/web/packages/CancerEvolutionVisualization/
GNU General Public License v2.0
2 stars 0 forks source link

Hwinata add genome distribution plot #97

Closed whelena closed 4 months ago

whelena commented 11 months ago

Description

Added a QC plot made by @WuSelina for visualizing clone distribution across the genome

Closes #88

Pipeline Run Results

[^1]: UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records [^2]: The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. [^3]: Genetic information is considered PHI. Forensic assays can identify patients with as few as 21 SNPs [^4]: RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.

  To automatically exclude such files using a .gitignore file, see here for example.

whelena commented 11 months ago

For chromosome information, I'm currently just readiung from a .tsv file i stuck into data/chr.info. I don't know if this is the best approach so any comments is much appreciated.

WuSelina commented 11 months ago

For chromosome information, I'm currently just readiung from a .tsv file i stuck into data/chr.info. I don't know if this is the best approach so any comments is much appreciated.

I think the way you have it is good. I am also not sure of what the recommended standard is, though.

I previously used get.chr.length() from the bedr package which gives lengths for GRCh38 if specified, but this function does not return 'GC_count' or 'GC_percent' and returns out-of-order chrs (19 & 20 are swapped), so I have been reordering the resulting dataframe just in case:

# Get chr lengths info chr.len <- get.chr.length(build = 'hg38'); # Keep only chrs 1-22 and sex chrs and remove 'chr' prefix chr.len$chr <- gsub('chr', '', chr.len$chr); chr.len <- subset(chr.len, subset = chr %in% c(1:22, 'X', 'Y'));

# Reorder the chr info chrom.order <- c(as.character(1:22), 'X', 'Y'); # Convert the 'chr' column to a factor with custom levels chr.len$chr <- factor(chr.len$chr, levels = chrom.order); # Sort the dataframe based on the order of the 'chr' column chr.len <- chr.len[order(chr.len$chr), ];

whelena commented 4 months ago

@WuSelina Could you double check the density calculation to get the counts? The density.df$scaled.y code was not doing what i think it should be doing and giving me really small numbers. The function is under create.clone.genome.distribution.densityplot.R. Thanks!

whelena commented 4 months ago

R-CMD-check is failing due to missing documentation, which I will be fixing in a separate PR.