rnabioco / djvdj

An R package to analyze single-cell V(D)J data
https://rnabioco.github.io/djvdj
Other
23 stars 4 forks source link

fixing mutation frequencies #133

Closed kwells4 closed 1 year ago

kwells4 commented 1 year ago

Addresses #132

Adds one argument, include_constant to import_vdj. This can be removed. At the moment it allows the user to also include mutations in the constant region.

If include_constant = TRUE:

all_mutations = n_c + n_d + n_j + n_v
mutation_freq = all_mutations /(length_v + length_c + length_j + length_d)

If include_constant = FALSE (the default):

all_mutations = n_d + n_j + n_v
mutation_freq = all_mutations /(length_v + length_j + length_d)
jayhesselberth commented 1 year ago

could you please add a test or two for the new include_constant argument?

kwells4 commented 1 year ago

Added! It just simply calculates the number of "all" mutations with and without including the constant region and makes sure it matches the output. There was also a warning about many of the values in mut_coords mapping to the values in vdj_coords. This seemed like the expected behavior to me (as many mutations will be in one region and if the mutation is in the junction, one mutation will be in many regions), so I also added relationship = "many-to-many to line 967 of import_vdj. Let me know if you don't agree with this logic and I can remove it.

jayhesselberth commented 1 year ago

The "many-to-many" warnings are from changes in how dplyr deals with joins: https://dplyr.tidyverse.org/news/index.html#dplyr-111

sheridar commented 1 year ago

Thanks @kwells4! Based on your comments I think everything sounds good, I'll review the code and merge next week