raikens1 / mutation_rate

Code and raw data files for my work on private polymorphism differences between populations.
2 stars 2 forks source link

Polymorphism model underestimates #12

Closed raikens1 closed 6 years ago

raikens1 commented 7 years ago

Ran polymorphism_predictor on all chromosomes and found that there seems to be a systematic underestimation of the number of polymorphisms we expect compared to the number observed.

raikens1 commented 7 years ago

Think it's an issue with my gw_counts files. When I do it out myself with my new script site_counter.py, I get counts about 10% lower than what Varun gave me. Maybe he used a different region set than nc_regions for getting the files he ended up passing along.

In any case, I'm recalculating those now, and will open a pull request soon to rerun my analyses with the new counts. This is on branch recalculate_gw_counts #16

raikens1 commented 7 years ago

Yup, it's that. Opened a pull request and recalculating eeeeverything

raikens1 commented 7 years ago

Or... not. Blargh. Model still underestimates after all the recalculation. The change was a fraction of a percent.

raikens1 commented 7 years ago

Got it. There are some truncated files upstream in my analysis right after the filter_private step. This will require some serious retooling, but I'm pretty sure that's it.

The truncated files are: chr 2, 3, 4, 6, from AFR chr 1, 2, 3, 4, 5, 6 from EAS chr 2, 3, 4, 5, 6 from SAS chr 1, 2, 3, 4, 5, 6 from EUR

And let's not forget that anything COSMO, AF, AMR, or subpop-wise probably also has to be rerun.

Working on this on branch fix_truncation. #18

raikens1 commented 7 years ago

Need to check if this works now.