Closed whelena closed 4 months ago
For chromosome information, I'm currently just readiung from a .tsv file i stuck into data/chr.info
. I don't know if this is the best approach so any comments is much appreciated.
For chromosome information, I'm currently just readiung from a .tsv file i stuck into
data/chr.info
. I don't know if this is the best approach so any comments is much appreciated.
I think the way you have it is good. I am also not sure of what the recommended standard is, though.
I previously used get.chr.length()
from the bedr
package which gives lengths for GRCh38 if specified, but this function does not return 'GC_count' or 'GC_percent' and returns out-of-order chrs (19 & 20 are swapped), so I have been reordering the resulting dataframe just in case:
# Get chr lengths info
chr.len <- get.chr.length(build = 'hg38');
# Keep only chrs 1-22 and sex chrs and remove 'chr' prefix
chr.len$chr <- gsub('chr', '', chr.len$chr);
chr.len <- subset(chr.len, subset = chr %in% c(1:22, 'X', 'Y'));
# Reorder the chr info
chrom.order <- c(as.character(1:22), 'X', 'Y');
# Convert the 'chr' column to a factor with custom levels
chr.len$chr <- factor(chr.len$chr, levels = chrom.order);
# Sort the dataframe based on the order of the 'chr' column
chr.len <- chr.len[order(chr.len$chr), ];
@WuSelina Could you double check the density calculation to get the counts? The density.df$scaled.y
code was not doing what i think it should be doing and giving me really small numbers. The function is under create.clone.genome.distribution.densityplot.R
. Thanks!
R-CMD-check is failing due to missing documentation, which I will be fixing in a separate PR.
Description
Added a QC plot made by @WuSelina for visualizing clone distribution across the genome
Closes #88
Pipeline Run Results
Case 1
/hot/software/package/public-R-CancerEvolutionVisualization/development/test_input/multi-sample.tsv
/hot/software/package/public-R-CancerEvolutionVisualization/development/hwinata-add-genome-distribution-plot/no-defaults
Checklist
[x] This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
Disclosing PHI is a major problem[^1] - Even a small leak can be costly[^2].
[x] This PR does NOT contain germline genetic data[^3], RNA-Seq, DNA methylation, microbiome or other molecular data[^4].
[^1]: UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records [^2]: The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. [^3]: Genetic information is considered PHI. Forensic assays can identify patients with as few as 21 SNPs [^4]: RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.
.png
, .jpeg
),.pdf
,.RData
,.xlsx
,.doc
,.ppt
, or other output files.To automatically exclude such files using a .gitignore file, see here for example.
[x] I have read the code review guidelines and the code review best practice on GitHub check-list.
[x] I have set up or verified the
main
branch protection rule following the github standards before opening this pull request.[x] The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
[x] I have added the major changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.