MAGMA.Celltyping 2.0.0

MAJOR UPGRADE: MAGMA.Celltyping was revamped to meet CRAN standards, automatically install MAGMA, and take any species as input.

New features

Added a NEWS.md file to track changes to the package.
Automatically install MAGMA with new magma_install function; stores binaries in MAGMA.Celltyping-specific cache dir. Added various support functions to make this possible and ensure correct version is being used.
Added magma_uninstall function to remove one or all MAGMA binaries.
Allow MAGMA.Celltyping to install even if MAGMA is not installed. Instead, check at the beginning of functions that require MAGMA using magma_check.
- magma_links_stored: Include built-in metadata with links to all MAGMA versions with parsed version numbers, OS, and which is the latest version.
Call MAGMA commands with magma_run which finds the requested version of MAGMA and uses it.
Print readable, user-friendly MAGMA commands when being run through magma_cmd function.
Added unit tests.
Create hex sticker
New function: get_sub_SNP_LOC_DATA
Formally deprecated functions using .Deprecated function and removing all other internal code:
- get_genomebuild_for_sumstats
- build_snp_location_tables
- format.sumstats.for.magma
- format_sumstats_for_magma_macOnly
- standardise.sumstats.column.headers
- standardise.sumstats.column.headers.crossplatform
Removed sumstatsColHeaders from data, as it was only used in now-deprecated functions.
Renamed all functions with "." to "_" to meet coding standards.
Renamed functions to be more concise and avoid issues with test file names being too long:
- calculate.celltype.enrichment.probabilities.wtLimma --> calculate_celltype_enrichment_limma
- calculate.conditional.celltype.enrichment.probabilities.wtLimma --> calculate_conditional_celltype_enrichment_limma
Removed all large data/ to GitHub Releases, now accessible with dedicated piggyback-based functions:
- get_ctd: CellTypeDatasets
- get_example_gwas: GWAS summary stats
- get_genomeLocFile: NCBI gene coordinate references.
Create example full GWAS summary stats (both unfiltered and filtered + munged with MungeSumstats). Accessed by get_example_gwas.
- "prospective_memory"
- "fluid_intelligence"
- "educational_attainment"
Updated vignettes:
- Created concise Getting started vignette.
- Updated origiinal vignette and turned into full_workflow vignette.
Made certain functions run automatically internally, instead of having the user run them:
- get_genome_ref
- prepare_quantile_groups
Remove unnecessary dependencies:
- reshape
- cowplot
- SNPlocs.Hsapiens.dbSNP144.GRCh37
- SNPlocs.Hsapiens.dbSNP144.GRCh38
Replaced hgnc2entrez with improved hgnc2entrez_ortohgene from orthogene::all_genes. Benchmarked to confirm that the latter increases the number of genes that can be converted.
Allow all functions to accept datasets/gene lists from any species. Now automatically converted to output_species (default: "human") using orthogene.
Create MAGMA files repository using various OpenGWAS datasets that have been munged with MungeSumstats: https://github.com/neurogenomics/MAGMA_Files_Public
- magma_files_metadata: Built-in table of all pre-processed MAGMA files currently in the database.
Added API to search and access MAGMA files repository: import_magma_files.
Allow all relevant functions to take only MAGMA files as input (instead of requiring the GWAS summary stats); e.g. calculate_celltype_associations(magma_dir="<folder_containing_magma_files>") This function is also used for downloading MAGMA files in examples/unit tests.
Add header notation in code comments to improve code navigability.
Fix Roxygen notes:
- Document @title,@description,@param, @return for all exported (and many internal) functions.
- Document @examples for all exported (and many internal) functions.
- Used @importFrom or requireNamespace for all imports functions.
Replace usage of all 1:10 syntax.
Reduce number of functions in NAMESPACE
Set all defaults consistently across all functions:
- upstream_kb = 35
- downstream_kb = 10
Allow the use of non-European populations by downloading population-specific LD panels from 1KG with get_genome_ref(population = "<population_name>")
Handle other CTD matrix input types by ensuring standardisation as dense matrices when computing quantiles/normalization.
Take advantage of new EWCE features in bschilder_dev branch:
- Standardise CTD internally in all relevant functions using new EWCE::standardise_ctd
Create all-in-one functions celltype_associations_pipeline, which lets users specify which test they want to run with arguments, including:
- calculate_celltype_associations (Linear mode)
- calculate_celltype_associations (Top10% mode)
- calculate_conditional_celltype_associations
Parallelise celltype_associations_pipeline across multiple cores.
Removed old functions whose output were not being used:
- normalise_mean_exp
- bin_specificityDistance_into_quantiles
- bin_expression_into_quantiles
Add new function (plus tests):
- get_driver_genes
Added unit tests for:
- calculate_celltype_enrichment_limma
- adjust_zstat_in_genesOut
- Deprecated functions
Added R script to produce vignette results inst/extdata/MAGMA_Celltyping_1.0_vignette.R, and uploaded zipped folder via piggyback: MAGMA_Celltyping_1.0_results.zip
Added unit tests comparing old (1.0.0) vs new (>=2.0) MAGMA.Celltyping versions produce the same results; test-MAGMA_Celltyping_1.0_vs_2.0.R. Full report here.

Bug fixes

Removed usethis call from code.
Removed all library calls from code.
Avoid accidentally renaming columns with data.frame
Remove all suppressWarnings calls and resolve the underlying issues instead.
Add utils as Suggest.
Normalize paths to magma executables (to avoid path issues on WindowsOS).
Fixed axes in plot_celltype_associations, first reported here.
Fixed prepare_quantile_groups so that it's consistent with how EWCE compute specificity quantiles. Ensures that all celltypes (columns) have exactly the same number of quantiles, which was not the case before.
Fixed bug in ``

Some notes on the PR:

DESCRIPTION -
- Change my role to role="ctb", as a contributor. I haven't done much work on MAGMA.Celltyping so it really shouldn't be called out like this.
- I think you should be the maintainer of the package now, this makes the most sense since you will now understand it far better than anyone else (plus this will split EWCE and MAGMA between us which will hopefully mean less work for both of us). I think we should be consistent with our other packages and with this maintainer idea and you should put your role as role="cre" and Nathan's as role=c("cre","aut").
- I agree that we should change the version to 2. See below for Hadley Wickham's comments on this:
  
  Increment major, e.g. 1.0.0, for a major release. This is best reserved for changes that are not backward compatible and that are likely to affect many users. Going from 0.b.c to 1.0.0 typically indicates that your package is feature complete with a stable API.
- Since major changes are made here including deprecated functions so it makes sense. However, note the new version should be 2.0.0 not 2.0.1. Can you change this?
- The call Remotes: github::NathanSkene/EWCE, please make a note to remove this once the new EWCE version goes live in Apr/May.
Add function documentation -
- R/bin_expression_into_quantiles.R
- R/bin_specificity.R
- R/bin_specificityDistance_into_quantiles.R
- R/check_access.R
- R/check_celltype_names.R
- R/check_enrichment_mode.R
- R/check_entrez_genes.R
- R/check_quantiles.R
- R/create_genesets.R
- R/decompress.R
- R/find_GenesOut_files.R
- R/fix_ctd.R
- R/fix_path.R
- R/get_actual_path.R
- R/get_celltype_dict.R
- R/get_example_magma_files.R
- R/get_os.R
- R/get_top10percent.R
- R/github_download_files.R
- R/import_magma_files_metadata.R
- R/invert_dict.R - Is this necessary to have as a function, it is just a one liner?
- R/load_rdata.R
- R/magma_check.R
- R/magma_check_version_match.R
- R/magma_create_symlink.R
- R/magma_download_binary.R
- R/magma_executable_select.R
- R/magma_find_executable.R
- R/magma_installation_info.R
- R/magma_installed.R
- R/magma_installed_version.R
- R/magma_links.R
- R/magma_links_gather.R
- R/magma_links_query.R
- R/magma_os_suffix.R
- R/magma_read_gsa_out.R
- R/magma_read_sets_out.R
- R/message_cmd.R - Is this necessary to have as a function, it is just a one liner?
- R/messager.R
- R/normalise_mean_exp.R
- R/set_permissions.R
- R/use_distance_to_add_expression_level_info.R
R/magma_geneset_test.r - This script should be deleted
R/utils.r seems to do roughly the same as R/messager.R, delete utils.r and update any calls
README.md -
- I think Nathan's name should probably come last (as the PI for MAGMA.Celltyping)
- Can you update the MungeSumstats reference to the correct link and update the authors?
- When you submit to CRAN don't forget to add info to the README about installing from CRAN rather than Github.
Unit tests
- test-calculate_conditional_geneset_enrichment.R - isn't being used, should it be?
- test-map_snps_to_genes.r - isn't being used, should it be?
- Add multiple tests to show that MAGMA.Celltyping 2.0 gives the same results as 1.0. I think this is a very important step so we know all the changes you made didn't change the results you will get.
- What's the overall code coverage, it seems like further tests may be needed to cover more of the functionality? R CMD check on GHA seems to be running in 8 mins so we have a lot of time to play with which we should use to make the unit tests more robust.
Vignettes
- MAGMA.Celltyping.Rmd - in # Run cell-type enrichment analyses give information on what MAGMA.Celltyping::celltype_associations_pipeline is doing at a high-level for new users. Currently only the code is given with no explanation.
- MAGMA.Celltyping.Rmd - Same for # Plot results and the two subheadings. I get this is like the quickstart vignette but I think some more information is needed.
- full_workflow.Rmd - Similar to other vignette I think a few lines on why you run each of the MAGMA.Celltyping functions would be really helpful. You explain the functions but then don't give the bigger picture to why you would want to use each. Both vignettes feel slightly too bare and unintuitive for a new user currently.

DESCRIPTION

[x] Change my role to role="ctb", as a contributor. I haven't done much work on MAGMA.Celltyping so it really shouldn't be called out like this.
[x] I think you should be the maintainer of the package now, this makes the most sense since you will now understand it far better than anyone else (plus this will split EWCE and MAGMA between us which will hopefully mean less work for both of us). I think we should be consistent with our other packages and with this maintainer idea and you should put your role as role="cre" and Nathan's as role=c("cre","aut").

There can only be one "cre" (building the package throws an error otherwise). Keeping that as Nathan, since he both created it and will be the one to continue maintaining it after I leave the lab.

[x] I agree that we should change the version to 2. See below for Hadley Wickham's comments on this:

Increment major, e.g. 1.0.0, for a major release. This is best reserved for changes that are not backward compatible and that are likely to affect many users. Going from 0.b.c to 1.0.0 typically indicates that your package is feature complete with a stable API.
[x] Since major changes are made here including deprecated functions so it makes sense. However, note the new version should be 2.0.0 not 2.0.1. Can you change this?
[x] The call Remotes: github::NathanSkene/EWCE, please make a note to remove this once the new EWCE version goes live in Apr/May.

Noted here.

Add function documentation

Is this a requirement for CRAN, or a suggestion? Def good practice, but just trying to figure out what to prioritize

[x] R/bin_expression_into_quantiles.R
[x] R/bin_specificity.R
[x] R/bin_specificityDistance_into_quantiles.R
[x] R/check_access.R
[x] R/check_celltype_names.R
[x] R/check_enrichment_mode.R
[x] R/check_entrez_genes.R
[x] R/check_quantiles.R
[x] R/create_genesets.R
[x] R/decompress.R
[x] R/find_GenesOut_files.R
[x] R/fix_ctd.R
[x] R/fix_path.R
[x] R/get_actual_path.R
[x] R/get_celltype_dict.R
[x] R/get_example_magma_files.R
[x] R/get_os.R
[x] R/get_top10percent.R
[x] R/github_download_files.R
[x] R/import_magma_files_metadata.R
[x] R/invert_dict.R - Is this necessary to have as a function, it is just a one liner?
[x] R/load_rdata.R
[x] R/magma_check.R
[x] R/magma_check_version_match.R
[x] R/magma_create_symlink.R
[x] R/magma_download_binary.R
[x] R/magma_executable_select.R
[x] R/magma_find_executable.R
[x] R/magma_installation_info.R
[x] R/magma_installed.R
[x] R/magma_installed_version.R
[x] R/magma_links.R
[x] R/magma_links_gather.R
[x] R/magma_links_query.R
[x] R/magma_os_suffix.R
[x] R/magma_read_gsa_out.R
[x] R/magma_read_sets_out.R
[x] R/messager.R
[x] R/normalise_mean_exp.R
[x] R/set_permissions.R
[x] R/use_distance_to_add_expression_level_info.R
[x] R/magma_geneset_test.r - This script should be deleted
[x] R/utils.r seems to do roughly the same as R/messager.R, delete utils.r and update any calls
[x] R/message_cmd.R - Is this necessary to have as a function, it is just a one liner?

As a rule of mine, any bit of code you use more than once should be a function (even small ones). That way it is always consistent across usage.

README.md

[x] I think Nathan's name should probably come last (as the PI for MAGMA.Celltyping)
[x] Can you update the MungeSumstats reference to the correct link and update the authors?
[x] When you submit to CRAN don't forget to add info to the README about installing from CRAN rather than Github.

Unit tests

[x] test-calculate_conditional_geneset_enrichment.R - isn't being used, should it be?

Still trying to figure out this function. Need to have a discussion with @NathanSkene about this.

[x] test-map_snps_to_genes.r - isn't being used, should it be?

Wrote this and then realized it takes too long to run. Keeping in case we decide to use it later, and also just to have some means of checking whether it works (even if manually).

[x] Add multiple tests to show that MAGMA.Celltyping 2.0 gives the same results as 1.0. I think this is a very important step so we know all the changes you made didn't change the results you will get.
[x] What's the overall code coverage, it seems like further tests may be needed to cover more of the functionality? R CMD check on GHA seems to be running in 8 mins so we have a lot of time to play with which we should use to make the unit tests more robust.

40% currently. Getting this up is def a longer-term goal of mine, but I can't sink too much time into this atm. Also, some tests take an extremely long time (thus why i hashed out test-map_snps_to_genes). Added Issue here.

Vignettes

[x] MAGMA.Celltyping.Rmd - in # Run cell-type enrichment analyses give information on what MAGMA.Celltyping::celltype_associations_pipeline is doing at a high-level for new users. Currently only the code is given with no explanation.
[x] MAGMA.Celltyping.Rmd - Same for # Plot results and the two subheadings. I get this is like the quickstart vignette but I think some more information is needed.
[x] full_workflow.Rmd - Similar to other vignette I think a few lines on why you run each of the MAGMA.Celltyping functions would be really helpful. You explain the functions but then don't give the bigger picture to why you would want to use each. Both vignettes feel slightly too bare and unintuitive for a new user currently.

Just on "There can only be one "cre" (building the package throws an error otherwise). Keeping that as Nathan, since he both created it and will be the one to continue maintaining it after I leave the lab." - The bioconductor standard is that the maintainer should be the "cre" so that should be you. I would put Nathan as the aut only since having two cre throws an error. This is consistent with the labs other packages so I think we should follow it here too (I'm down as cre for EWCE/MungeSumstats). I know you will leave the lab after your PhD but that is a while away yet so I think we can update the cre when the time comes!

neurogenomics / MAGMA_Celltyping

`MAGMA.Celltyping`: bschilder_dev upgrade #93