morinlab / GAMBLR.data

Collection of Curated Data for Genomic Analysis of Mature B-cell Lymphomas in R
https://morinlab.github.io/GAMBLR.data/
MIT License
2 stars 0 forks source link

ashm data frames have inconsistent chr prefixing #11

Open lkhilton opened 1 year ago

lkhilton commented 1 year ago

The data frames stored under GAMBLR.data::somatic_hypermutation_locations_GRCh37_v* have chr prefixing in the chromosome column, which creates a lot of extra work to handle in most instances. Also the column names could be streamlined substantially.

Kdreval commented 1 year ago

I kept the chr prefixing and column names in this package to reproduce what was in the original GAMBLR bundled data. I can easily strip the prefix and update the column names - but not sure how many issues it will cause for the existing codebase in the current functions that (probably?) expect this behavior. I think to address this issue we need first a reproducible set of tests that can indicate whether or not there are issues with existing functions, and this issue will be an excellent use case for the testing approach. I will work on bundling the data with GAMBLR so we can start the work of developing the test suite.