Open hemstrow opened 5 years ago
for association testing, we've got armitage, chi-squared, and odds ratio done. No continuous yet.
The plot_clusters object should always return the raw PCA and tSNE data as well as the plots. If you save the output to a variable, the data should be there, I think as $pca$data, if I recall correctly.
I'll work on integrating the duplicate detection! I've got a function for it, I can just throw it in.
The new version of Colony (2.0.6.5) has an implementation to better account for genotyping error rates (https://doi.org/10.1038/s41437-018-0178-7). Maybe it would be good to make an input for the write_colony function for SNP specific error rates.
Done: HWE parentage via colony (needs documentation) genomic prediction manhattan plots SFS estimation (still requires anc states, even for folded) pop assignments plotted on map (update to be more flexible needed) pop assignment via sNMF/snapclust association testing (GMMAT, armitage, odds ratio, chisq)
Needed: folded SFS estimation without ancestral states (just adjust default behavior) plotting on map with other stats Ne abba/baba
Maybe an LD filter for filter_snps?
We tested a bit with rgap_snps, but it might be nice to incorporate and LD filter of some kind.
Some new stuff got added: Ne (facets still need to be fixed) genetic distance matrix creation geographic distance matrix creation and isolation by distance
I think putting a LD filter in would be great--do you have a specific approach that you think might work?
Not sure just yet about how to implement a LD filter. I think the r_gaps function worked pretty well to drop some data when we tried it, but for an LD filter we'll probably need to do a bit more thinking to actually do that.
On Fri, Aug 21, 2020 at 2:55 PM William Hemstrom notifications@github.com wrote:
Some new stuff got added: Ne (facets still need to be fixed) genetic distance matrix creation geographic distance matrix creation and isolation by distance
I think putting a LD filter in would be great--do you have a specific approach that you think might work?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hemstrow/snpR/issues/6#issuecomment-678532810, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHX6UNMIHEEQIVQKBE2L4LTSB3UMBANCNFSM4HFBAYSA .
Would like line pointers on more errors (such as the import functions)
Done: HWE parentage via colony (needs documentation) genomic prediction manhattan plots SFS estimation (still requires anc states, even for folded) pop assignments plotted on map (update to be more flexible needed) pop assignment via sNMF/snapclust association testing (GMMAT, armitage, odds ratio, chisq) folded SFS estimation without ancestral states (just adjust default behavior) Ne abba/baba Filter on MAC (instead of just MAF)
Needed: Plotting on map with other stats Would like line pointers on more errors (such as the import functions) Allelic Richness Poly-allelic marker support (in progress on dev) pool-seq support (in progress on dev)
Would their be any possibility to add a merge function that takes two snpR datasets and merges into a singular (I understand this can be done externally however I do think internal functions would be of utility). In addition would their be potential for a save function for snpR objects.
Would their be any possibility to add a merge function that takes two snpR datasets and merges into a singular (I understand this can be done externally however I do think internal functions would be of utility). In addition would their be potential for a save function for snpR objects.
Yah, that could be something we could work on. It shouldn't take too long to implement, although it'll be a little tougher since I'd try to conserve the allele/genotype tabulation that's already been done when possible which'll take a bit of work. Consider it added to the list!
snpRdata
object saving and loading is already accomplish-able via saveRDS
and loadRDS
:
saveRDS(stickSNPs, "saved_snpRdata.RDS")
d <- loadRDS("saved_snpRdata.RDS")
identical(d, stickSNPS)
Should return TRUE
! If it would be helpful, it'd be pretty straightforward to implement wrappers or add a description of this to the vignette!
EDIT: @EamonnCooper : merge_snpRdata()
is now available on the dev branch, installable with the remotes package:
remotes::install_github("hemstrow/snpR", ref = "dev")
Is it possible to extend the calc_directionality() function to include spatially explicit info and estimate the origin of range expansion (TDOA equations) from Peter and Slatkin, 2013.
Is it possible to extend the calc_directionality() function to include spatially explicit info and estimate the origin of range expansion (TDOA equations) from Peter and Slatkin, 2013.
That is probably something I can add, yes. I'm currently working on a major memory efficiency update and can add that feature to the dev branch once it is stable--I will respond when done!
Hey @macurqcron,
Is it possible to extend the calc_directionality() function to include spatially explicit info and estimate the origin of range expansion (TDOA equations) from Peter and Slatkin, 2013.
calc_origin_of_expansion()
is now in the dev version, which you can install with:
remotes::install_github("hemstrow/snpR", ref = "dev")
Note that a few of the hotfixes (see the news page) on the master branch are not yet present in the dev branch, but most everything should work OK. Note that I used stats::optim()
to find the argmax for the coordinates instead of nls()
due to nls not seeming to like complex inputs. optim
should still do a decent job, but I don't have great test data for this, so please let me know if things don't seem to work well.
Thanks so much @hemstrow! I have tried running the code am having an issue with running the function with a complex facet (e.g., ancestry cluster AND population identity) -- I would like to estimate the origin of range expansion among populations that are within a given ancestry cluster (i.e., facet = "cluster.pop"). When I try to run the code I get this output and error:
Beginning setup...
Error in `[.data.frame`(sample.meta(x), , c(facet, "x", "y")) :
undefined columns selected
I am running my code the same way as your example code (successfully generated maf data, added geographic coordinate data to sample metadata, created the projection object), the only difference is my facet is "cluster.pop". When I run the code with a simple facet (i.e., pop), the function runs and produces the output (have done this with the example data; function runs with a simple facet with my data, but I have not finished running the function with my data since it takes a lot of time).
Any suggestions to get around this issue with a complex function is appreciated. Not sure if a workaround would be to subset my snpRdata object to only contain populations that belong to a specific cluster (so three subsetted datasets since I have three ancestry clusters), or if this would be a bad idea.
@macurqcron
Thanks for the heads-up! The bug should be fixed now (it works fine with the pop.fam
facet in the example data now). Please let me know if you hit more bugs--I tried to test for the usual suspects but since it's a new and fairly complex function I'm bound to have missed some. One thing I'd strongly recommend is to use an outgroup to determine the ancestral and derived allele states for your loci if at all possible, since that'll make direcitonality indices much more meaningful than just using the major allele as the ancestral state as in the example.
Thanks @hemstrow! For the outgroup, I am learning as I go and had a question - I aligned my ddRADseq data to the reference genome of a closely related species. Can I use the REF column from my vcf file as the ancestral allele state and the ALT column as my derived allele state? Or, do I need to bring in another independent genome that my data has not be aligned to yet (and if so, any suggestions on tutorials about how to add this new genome to my VCF would be really helpful)
@macurqcron
Thanks @hemstrow! For the outgroup, I am learning as I go and had a question - I aligned my ddRADseq data to the reference genome of a closely related species. Can I use the REF column from my vcf file as the ancestral allele state and the ALT column as my derived allele state? Or, do I need to bring in another independent genome that my data has not be aligned to yet (and if so, any suggestions on tutorials about how to add this new genome to my VCF would be really helpful)
If it is pretty closely related, that might be a reasonable approach, yah. Some will be off but it should be pretty close! Something like a sister taxa is ideal. I'm not an expert in this, however, so I'd look around a bit and make sure that seems justified. You might also want to take a glance at the unfolded site frequency spectra and make sure they look reasonable before continuing.
HWE (x) association testing Ne parentage genomic prediction manhattan plots SFS abc plots with circos abba/baba plotting on maps (pop assignments, ect). assignment (fastStructure calls and interp)