morinlab / GAMBLR

Set of standardized functions to operate with genomic data
MIT License
4 stars 2 forks source link

GAMBLR Code Review #53

Closed mattssca closed 2 years ago

mattssca commented 2 years ago

GAMBLR Code Review

This PR addresses code readability, function definitions and suggestions for improvements with the hope to make GAMBLR more user-friendly and improve documentation (see issue #51). The edits in this PR include:

mattssca commented 2 years ago

An update to the GAMBLR overhaul and plan of attack.

  1. Go over all the R-scripts (currently viz.R left) and make updates as described in PR above.
  2. Implement a testing module (to honour transparency and reproducibility). The testing will be in the form of a Rscript, which performs sanity checks on the output of updated functions, comparing to the output of original functions (on the master branch) and returns TRUE if the outputs are matching to ensure that the modified code is still doing what it is supposed to. This would catch any syntax related errors that could be present (introduced through the updates on this branch).
  3. NAMESPACE will also be updated to ensure that propper functions are exported (i.e unwanted helper functions are not exported).
  4. Outstanding problems, suggestions for improvements, limitations, etc. described inside the code will be addressed in order of priority and moved to "issues" on this Github repo if needed.
  5. Vignettes will also be receiving a facelift where applicable.
mattssca commented 2 years ago

The latest commit includes changes related to bugs found when testing the revised scripts. For testing, I've run examples.Rmd using the updated scripts on this branch. Other additions include examples and returns for functions that previously were missing this information. NAMESPACE has also been updated to omit internal functions. This was edited by hand (even though you're not supposed to!), I will remove @export from internal functions and re-generate NAMESPACE to prevent these functions to populate that document.

mattssca commented 2 years ago

Documentation has been re-generated with @export removed from internal functions.

mattssca commented 2 years ago

This latest commit includes updates to examples.Rmd and GAMBLR_presentaion.Rmd. Code has been updated for syntax consistencies and updating examples that previously returned errors. Both Rmd files have been tested (and passing without errors) before pushing.

mattssca commented 2 years ago

Recent commit includes added examples for the vignette. This update includes the following functions:

mattssca commented 2 years ago

I went over all the scripts to find any outstanding issues or "todo's", this is what I found. I will continue to work with these items and update functions accordingly. If it's a good idea, I could also create issues for these items to improve transparency and documentation?

Function Script Paramter Decription
annotate_sv annotate.R - Need a better system for cataloguing and using these but this works for our current data (hg19 coordinates).
annotate_ssm_blacklist annotate.R projection Function currently only suppoorts grch37.
annotate_igh_breakpoints annotate.R projection Function currently only suppoorts grch37.
get_cn_segments database.R - Improve this query to allow for partial overlaps.
get_gambl_metadata database.R - Fix this in the metadata, all_meta[all_meta$pathology == "B-cell unclassified","pathology"] = "HGBL"
get_ssm_by_samples database.R projection Function currently only suppoorts grch37.
get_ssm_by_sample database.R projection Function currently only suppoorts grch37.
get_merged_result database.R projection Function currently only suppoorts grch37.
get_combined_sv database.R projection Function currently only suppoorts grch37.
get_manta_sv database.R projection Function currently only suppoorts grch37.
get_cn_states database.R all_cytobands Function currently only suppoorts grch37.
gene_to_region utillities.R genome_build Function currently only suppoorts grch37.
ashm_rainbow_plot viz.R - Migrate viz/plotting functions that don't directly rely on the database to a separate file.
rdmorin commented 2 years ago

Do you want to try to tackle the merge conflicts so we can get this into Master soon?

mattssca commented 2 years ago

Yes, I can do that after the CLC Journal Club. Question, should I push these new plotting functions to a new branch and merge that, instead of having all my code-review being merged to master?

mattssca commented 2 years ago

I have now resolved all merge conflicts and tested all affected functions (added minor updates to examples in function description that previously were broken, in my latest commit). The testing was done by running examples in vignettes as well as running individual functions in the package. Everything seems to work the intended way and I am glad to say that this branch is now in a state ready to be merged into master. I am not sure what's the best way to do this, since there are a lot of lines changed (close to 12K lines), I feel bad for whoever takes on the task of reviewing this extensive PR. If I can help in any way, let me know.

Kdreval commented 2 years ago

A general idea for this PR: should we tag our current master with a version, like version 0 or 0.99 - something along these lines. Then once this is merged we tag it with version 1. In this way, we can clearly separate when the "clean" version started and have a reference point to it.

mattssca commented 2 years ago

Thanks for all your comments. I've resolved the conversations on subjects that have been fixed. I will go over the functions in viz.R to import dependencies where needed and push that together with all edits I've made (based on your comments). In addition, I will add the colour option to focal_cn_plot as suggested in my coming commit. I also think it would be a good idea to tag the current GAMBLR version as suggested by Kostia. Is this something you can do Kostia?

Thanks again,

Kdreval commented 2 years ago

I can definitely tag it - what do you think is the best way, version 0 or like 0.99? Maybe the current version should be 1, and the clean will be bumped to 2?

mattssca commented 2 years ago

I like having the current state as 1, then once this branch is merged, we call it 2.0?