uclahs-cds / package-CancerEvolutionVisualization

Publication Quality Phylogenetic Tree Plots
https://cran.r-project.org/web/packages/CancerEvolutionVisualization/
GNU General Public License v2.0
2 stars 0 forks source link

Reimplement angle calculation #87

Closed dan-knight closed 8 months ago

dan-knight commented 9 months ago

Description

Currently, the core tree creation code contains complex, interdependent logic. These refactors will address this problem, making it possible to make progress on more complex requirements and problems (for example, #103).

Before Refactor

image

After Refactor (Ideal)

image

@jarbet, @aholmes, @nwiltsie: Don't worry about reviewing the CEV code itself. I mainly tagged you just to get some feedback on the approach I took to traverse the tree (most specifically, removing recursion). This can be found here and here. My solution uses a FIFO queue to iteratively traverse the nodes in the same order as the previous recursive code.

Some background: R does not seem to be designed to handle recursion cleanly or reliably. Its limitations are due to core design decisions in the R language which are directly at-odds with recursion. So, it seems that they are here to stay.

The idea of pointers isn't really present in R. Unlike languages like Python, a mutable data structure that is passed to a function will be silently copied when it is modified. This copy is limited to the scope of that function. Further, if that copy is then itself passed to a function, another copy will be made within the scope of the second function. As a result, recursion quickly becomes very costly in terms of memory.

The <<- scoped assignment operator can be used as an alternative to address this problem, but it still does not behave like a traditional pointer, and in practice, it can be opaque and inconsistent. I don't think it's really intended to be a robust solution for recursion.

Checklist

[^1]: UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records [^2]: The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. [^3]: Genetic information is considered PHI. Forensic assays can identify patients with as few as 21 SNPs [^4]: RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.

  To automatically exclude such files using a .gitignore file, see here for example.

nwiltsie commented 9 months ago

I'm coming in cold to this - can you provide a code snippet that exercises this behavior so that I can see the before/after?