Closed vladimirsouza closed 1 year ago
Can you clarify whether a sample without results will return CN 2 or NA? It looks like the imaginary example has 2, which implies we know it's copy number.
The function was returning 2 for samples without results. But, answering your question, it should return NA. However, I was a bit reluctant to make this change because the code was deliberately changing NA to 2 -- see line #fill in any sample/region combinations with missing data as diploid
. So I thought there might be an application for this, which I can't see at the moment (perhaps to make filtering easier). Then I created this new missing_data_as_diploid
argument to let the user decides if missing data return NA or 2. But it would be fine for me to make the function always return NA for this case.
My understanding is that the filling of NA to 2 was intended to deal with samples that had a copy number profile but for some segments the value was NA. Perhaps the best place to start is to confirm whether the NA -> 2 approach was right to begin with or if we should do away with it completely. @Kdreval may have more background on this.
I think what Vladimir did here is great. Initially this was implemented because we had a mismatch between what samples were completed the CNV or not, depending on the seq type and pairing status - and this was affecting how MySQL was set up and how we pulled from there, so it was easier to implement this way. Now that we have all in place, I like that Vladimir made it user-configurable so we respect the legacy behaviour but also the default behaviour is now providing the desired output. I think this PR can go ahead and be merged - I'll approve it now.
This PR addresses the issue #207 -
get_cn_states()
output is misleading for missing samples.Pull Request Checklists
Checklist for all PRs
Required
Kostia's examples
All examples from get_cn_states documentation
devtools::document()
) and addedNAMESPACE
and all other modified files in the root directory and underman
.Checklist for changes to existing code
[x] I added/removed arguments to a function and updated documentation for all changed/new arguments
[x] I tested the new code for compatibility with existing functionality in the Master branch (please provide a reprex of how you tested the original functionality)
get_gene_cn_and_expression
is the only function that usesget_cn_states
internally.