Closed mattssca closed 2 years ago
An update to the GAMBLR overhaul and plan of attack.
The latest commit includes changes related to bugs found when testing the revised scripts. For testing, I've run examples.Rmd using the updated scripts on this branch. Other additions include examples and returns for functions that previously were missing this information. NAMESPACE has also been updated to omit internal functions. This was edited by hand (even though you're not supposed to!), I will remove @export from internal functions and re-generate NAMESPACE to prevent these functions to populate that document.
Documentation has been re-generated with @export removed from internal functions.
This latest commit includes updates to examples.Rmd and GAMBLR_presentaion.Rmd. Code has been updated for syntax consistencies and updating examples that previously returned errors. Both Rmd files have been tested (and passing without errors) before pushing.
Recent commit includes added examples for the vignette. This update includes the following functions:
I went over all the scripts to find any outstanding issues or "todo's", this is what I found. I will continue to work with these items and update functions accordingly. If it's a good idea, I could also create issues for these items to improve transparency and documentation?
Function | Script | Paramter | Decription |
---|---|---|---|
annotate_sv | annotate.R | - | Need a better system for cataloguing and using these but this works for our current data (hg19 coordinates). |
annotate_ssm_blacklist | annotate.R | projection | Function currently only suppoorts grch37. |
annotate_igh_breakpoints | annotate.R | projection | Function currently only suppoorts grch37. |
get_cn_segments | database.R | - | Improve this query to allow for partial overlaps. |
get_gambl_metadata | database.R | - | Fix this in the metadata, all_meta[all_meta$pathology == "B-cell unclassified","pathology"] = "HGBL" |
get_ssm_by_samples | database.R | projection | Function currently only suppoorts grch37. |
get_ssm_by_sample | database.R | projection | Function currently only suppoorts grch37. |
get_merged_result | database.R | projection | Function currently only suppoorts grch37. |
get_combined_sv | database.R | projection | Function currently only suppoorts grch37. |
get_manta_sv | database.R | projection | Function currently only suppoorts grch37. |
get_cn_states | database.R | all_cytobands | Function currently only suppoorts grch37. |
gene_to_region | utillities.R | genome_build | Function currently only suppoorts grch37. |
ashm_rainbow_plot | viz.R | - | Migrate viz/plotting functions that don't directly rely on the database to a separate file. |
Do you want to try to tackle the merge conflicts so we can get this into Master soon?
Yes, I can do that after the CLC Journal Club. Question, should I push these new plotting functions to a new branch and merge that, instead of having all my code-review being merged to master?
I have now resolved all merge conflicts and tested all affected functions (added minor updates to examples in function description that previously were broken, in my latest commit). The testing was done by running examples in vignettes as well as running individual functions in the package. Everything seems to work the intended way and I am glad to say that this branch is now in a state ready to be merged into master. I am not sure what's the best way to do this, since there are a lot of lines changed (close to 12K lines), I feel bad for whoever takes on the task of reviewing this extensive PR. If I can help in any way, let me know.
A general idea for this PR: should we tag our current master with a version, like version 0 or 0.99 - something along these lines. Then once this is merged we tag it with version 1. In this way, we can clearly separate when the "clean" version started and have a reference point to it.
Thanks for all your comments. I've resolved the conversations on subjects that have been fixed. I will go over the functions in viz.R to import dependencies where needed and push that together with all edits I've made (based on your comments). In addition, I will add the colour option to focal_cn_plot as suggested in my coming commit. I also think it would be a good idea to tag the current GAMBLR version as suggested by Kostia. Is this something you can do Kostia?
Thanks again,
I can definitely tag it - what do you think is the best way, version 0 or like 0.99? Maybe the current version should be 1, and the clean will be bumped to 2?
I like having the current state as 1, then once this branch is merged, we call it 2.0?
GAMBLR Code Review
This PR addresses code readability, function definitions and suggestions for improvements with the hope to make GAMBLR more user-friendly and improve documentation (see issue #51). The edits in this PR include: