hemstrow / snpR

GNU General Public License v3.0
21 stars 7 forks source link

things to add #6

Open hemstrow opened 5 years ago

hemstrow commented 5 years ago

HWE (x) association testing Ne parentage genomic prediction manhattan plots SFS abc plots with circos abba/baba plotting on maps (pop assignments, ect). assignment (fastStructure calls and interp)

hemstrow commented 5 years ago

for association testing, we've got armitage, chi-squared, and odds ratio done. No continuous yet.

mlyjones commented 5 years ago
hemstrow commented 5 years ago

The plot_clusters object should always return the raw PCA and tSNE data as well as the plots. If you save the output to a variable, the data should be there, I think as $pca$data, if I recall correctly.

I'll work on integrating the duplicate detection! I've got a function for it, I can just throw it in.

mlyjones commented 4 years ago

The new version of Colony (2.0.6.5) has an implementation to better account for genotyping error rates (https://doi.org/10.1038/s41437-018-0178-7). Maybe it would be good to make an input for the write_colony function for SNP specific error rates.

hemstrow commented 4 years ago

Done: HWE parentage via colony (needs documentation) genomic prediction manhattan plots SFS estimation (still requires anc states, even for folded) pop assignments plotted on map (update to be more flexible needed) pop assignment via sNMF/snapclust association testing (GMMAT, armitage, odds ratio, chisq)

Needed: folded SFS estimation without ancestral states (just adjust default behavior) plotting on map with other stats Ne abba/baba

mlyjones commented 4 years ago

Maybe an LD filter for filter_snps?

We tested a bit with rgap_snps, but it might be nice to incorporate and LD filter of some kind.

hemstrow commented 3 years ago

Some new stuff got added: Ne (facets still need to be fixed) genetic distance matrix creation geographic distance matrix creation and isolation by distance

I think putting a LD filter in would be great--do you have a specific approach that you think might work?

mlyjones commented 3 years ago

Not sure just yet about how to implement a LD filter. I think the r_gaps function worked pretty well to drop some data when we tried it, but for an LD filter we'll probably need to do a bit more thinking to actually do that.

On Fri, Aug 21, 2020 at 2:55 PM William Hemstrom notifications@github.com wrote:

Some new stuff got added: Ne (facets still need to be fixed) genetic distance matrix creation geographic distance matrix creation and isolation by distance

I think putting a LD filter in would be great--do you have a specific approach that you think might work?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hemstrow/snpR/issues/6#issuecomment-678532810, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHX6UNMIHEEQIVQKBE2L4LTSB3UMBANCNFSM4HFBAYSA .

hemstrow commented 1 year ago

Would like line pointers on more errors (such as the import functions)

hemstrow commented 1 year ago

Done: HWE parentage via colony (needs documentation) genomic prediction manhattan plots SFS estimation (still requires anc states, even for folded) pop assignments plotted on map (update to be more flexible needed) pop assignment via sNMF/snapclust association testing (GMMAT, armitage, odds ratio, chisq) folded SFS estimation without ancestral states (just adjust default behavior) Ne abba/baba Filter on MAC (instead of just MAF)

Needed: Plotting on map with other stats Would like line pointers on more errors (such as the import functions) Allelic Richness Poly-allelic marker support (in progress on dev) pool-seq support (in progress on dev)

EamonnCooper commented 1 year ago

Would their be any possibility to add a merge function that takes two snpR datasets and merges into a singular (I understand this can be done externally however I do think internal functions would be of utility). In addition would their be potential for a save function for snpR objects.

hemstrow commented 1 year ago

Would their be any possibility to add a merge function that takes two snpR datasets and merges into a singular (I understand this can be done externally however I do think internal functions would be of utility). In addition would their be potential for a save function for snpR objects.

Yah, that could be something we could work on. It shouldn't take too long to implement, although it'll be a little tougher since I'd try to conserve the allele/genotype tabulation that's already been done when possible which'll take a bit of work. Consider it added to the list!

snpRdata object saving and loading is already accomplish-able via saveRDS and loadRDS:

saveRDS(stickSNPs, "saved_snpRdata.RDS")

d <- loadRDS("saved_snpRdata.RDS")

identical(d, stickSNPS)

Should return TRUE! If it would be helpful, it'd be pretty straightforward to implement wrappers or add a description of this to the vignette!

EDIT: @EamonnCooper : merge_snpRdata() is now available on the dev branch, installable with the remotes package:

remotes::install_github("hemstrow/snpR", ref = "dev")

macurqcron commented 4 months ago

Is it possible to extend the calc_directionality() function to include spatially explicit info and estimate the origin of range expansion (TDOA equations) from Peter and Slatkin, 2013.

hemstrow commented 4 months ago

Is it possible to extend the calc_directionality() function to include spatially explicit info and estimate the origin of range expansion (TDOA equations) from Peter and Slatkin, 2013.

That is probably something I can add, yes. I'm currently working on a major memory efficiency update and can add that feature to the dev branch once it is stable--I will respond when done!

hemstrow commented 3 months ago

Hey @macurqcron,

Is it possible to extend the calc_directionality() function to include spatially explicit info and estimate the origin of range expansion (TDOA equations) from Peter and Slatkin, 2013.

calc_origin_of_expansion() is now in the dev version, which you can install with:

remotes::install_github("hemstrow/snpR", ref = "dev")

Note that a few of the hotfixes (see the news page) on the master branch are not yet present in the dev branch, but most everything should work OK. Note that I used stats::optim() to find the argmax for the coordinates instead of nls() due to nls not seeming to like complex inputs. optim should still do a decent job, but I don't have great test data for this, so please let me know if things don't seem to work well.

macurqcron commented 3 months ago

Thanks so much @hemstrow! I have tried running the code am having an issue with running the function with a complex facet (e.g., ancestry cluster AND population identity) -- I would like to estimate the origin of range expansion among populations that are within a given ancestry cluster (i.e., facet = "cluster.pop"). When I try to run the code I get this output and error:

Beginning setup...
Error in `[.data.frame`(sample.meta(x), , c(facet, "x", "y")) : 
  undefined columns selected

I am running my code the same way as your example code (successfully generated maf data, added geographic coordinate data to sample metadata, created the projection object), the only difference is my facet is "cluster.pop". When I run the code with a simple facet (i.e., pop), the function runs and produces the output (have done this with the example data; function runs with a simple facet with my data, but I have not finished running the function with my data since it takes a lot of time).

Any suggestions to get around this issue with a complex function is appreciated. Not sure if a workaround would be to subset my snpRdata object to only contain populations that belong to a specific cluster (so three subsetted datasets since I have three ancestry clusters), or if this would be a bad idea.

hemstrow commented 3 months ago

@macurqcron

Thanks for the heads-up! The bug should be fixed now (it works fine with the pop.fam facet in the example data now). Please let me know if you hit more bugs--I tried to test for the usual suspects but since it's a new and fairly complex function I'm bound to have missed some. One thing I'd strongly recommend is to use an outgroup to determine the ancestral and derived allele states for your loci if at all possible, since that'll make direcitonality indices much more meaningful than just using the major allele as the ancestral state as in the example.

macurqcron commented 3 months ago

Thanks @hemstrow! For the outgroup, I am learning as I go and had a question - I aligned my ddRADseq data to the reference genome of a closely related species. Can I use the REF column from my vcf file as the ancestral allele state and the ALT column as my derived allele state? Or, do I need to bring in another independent genome that my data has not be aligned to yet (and if so, any suggestions on tutorials about how to add this new genome to my VCF would be really helpful)

hemstrow commented 3 months ago

@macurqcron

Thanks @hemstrow! For the outgroup, I am learning as I go and had a question - I aligned my ddRADseq data to the reference genome of a closely related species. Can I use the REF column from my vcf file as the ancestral allele state and the ALT column as my derived allele state? Or, do I need to bring in another independent genome that my data has not be aligned to yet (and if so, any suggestions on tutorials about how to add this new genome to my VCF would be really helpful)

If it is pretty closely related, that might be a reasonable approach, yah. Some will be off but it should be pretty close! Something like a sister taxa is ideal. I'm not an expert in this, however, so I'd look around a bit and make sure that seems justified. You might also want to take a glance at the unfolded site frequency spectra and make sure they look reasonable before continuing.