Currently, the core tree creation code contains complex, interdependent logic. These refactors will address this problem, making it possible to make progress on more complex requirements and problems (for example, #103).
Before Refactor
After Refactor (Ideal)
@jarbet, @aholmes, @nwiltsie:
Don't worry about reviewing the CEV code itself. I mainly tagged you just to get some feedback on the approach I took to traverse the tree (most specifically, removing recursion). This can be found here and here. My solution uses a FIFO queue to iteratively traverse the nodes in the same order as the previous recursive code.
Some background: R does not seem to be designed to handle recursion cleanly or reliably. Its limitations are due to core design decisions in the R language which are directly at-odds with recursion. So, it seems that they are here to stay.
The idea of pointers isn't really present in R. Unlike languages like Python, a mutable data structure that is passed to a function will be silently copied when it is modified. This copy is limited to the scope of that function. Further, if that copy is then itself passed to a function, another copy will be made within the scope of the second function. As a result, recursion quickly becomes very costly in terms of memory.
The <<- scoped assignment operator can be used as an alternative to address this problem, but it still does not behave like a traditional pointer, and in practice, it can be opaque and inconsistent. I don't think it's really intended to be a robust solution for recursion.
Checklist
[X] This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded. Disclosing PHI is a major problem[^1] - Even a small leak can be costly[^2].
[X] This PR does NOT contain germline genetic data[^3], RNA-Seq, DNA methylation, microbiome or other molecular data[^4].
[X] This PR does NOT contain other non-plain text files, such as: compressed files, images (e.g..png, .jpeg), .pdf, .RData, .xlsx, .doc, .ppt, or other output files.
To automatically exclude such files using a .gitignore file, see here for example.
[X] I have set up or verified the main branch protection rule following the github standards before opening this pull request.
[X] The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
[X] I have added the major changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
Description
Currently, the core tree creation code contains complex, interdependent logic. These refactors will address this problem, making it possible to make progress on more complex requirements and problems (for example, #103).
Before Refactor
After Refactor (Ideal)
@jarbet, @aholmes, @nwiltsie: Don't worry about reviewing the CEV code itself. I mainly tagged you just to get some feedback on the approach I took to traverse the tree (most specifically, removing recursion). This can be found here and here. My solution uses a FIFO queue to iteratively traverse the nodes in the same order as the previous recursive code.
Some background: R does not seem to be designed to handle recursion cleanly or reliably. Its limitations are due to core design decisions in the R language which are directly at-odds with recursion. So, it seems that they are here to stay.
The idea of pointers isn't really present in R. Unlike languages like Python, a mutable data structure that is passed to a function will be silently copied when it is modified. This copy is limited to the scope of that function. Further, if that copy is then itself passed to a function, another copy will be made within the scope of the second function. As a result, recursion quickly becomes very costly in terms of memory.
The
<<-
scoped assignment operator can be used as an alternative to address this problem, but it still does not behave like a traditional pointer, and in practice, it can be opaque and inconsistent. I don't think it's really intended to be a robust solution for recursion.Checklist
[X] This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
Disclosing PHI is a major problem[^1] - Even a small leak can be costly[^2].
[X] This PR does NOT contain germline genetic data[^3], RNA-Seq, DNA methylation, microbiome or other molecular data[^4].
[^1]: UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records [^2]: The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records. [^3]: Genetic information is considered PHI. Forensic assays can identify patients with as few as 21 SNPs [^4]: RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.
.png
, .jpeg
),.pdf
,.RData
,.xlsx
,.doc
,.ppt
, or other output files.To automatically exclude such files using a .gitignore file, see here for example.
[X] I have read the code review guidelines and the code review best practice on GitHub check-list.
[X] I have set up or verified the
main
branch protection rule following the github standards before opening this pull request.[X] The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
[X] I have added the major changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.