mennodejong1986 / SambaR

SambaR: Snp datA Management and Basic Analyses in R
MIT License
24 stars 6 forks source link

optim.a.score n.sim arg #22

Open StephanieTodd opened 2 years ago

StephanieTodd commented 2 years ago

Hello, On line 12050 of v1.06, within 'adegenet_dapc' function, 'optim.a.score' needs to specify higher n.sim than the default (n.sim = 10) e.g. my_ascore <- optim.a.score(dapc.out, n.sim = 100) otherwise the optimum number of PCs retained for the ascore jumps around with each run, and consequently there is a different ascore plot in each of the dapc folders. I found n.sim = 50 was ok and ascore varied by only one or two DCs, n.sim = 100 was better. This does slow down processing a bit. Cheers, Steph

mennodejong1986 commented 2 years ago

Hi Steph, Thanks for making me aware of this! I will include this correction in SambaR version 1.08, including some other small changes/corrections related to DAPC. Best, Menno

StephanieTodd commented 2 years ago

Hi Menno, Great, thanks. I also noticed something else that may or not be something you want to change. Currently the DAPC assignment test heatmaps produced reflect assignment probabilities to de novo clusters (without prior pop info) vs captured pops, not assignment probability to a priori pops. I think this was intentional becasue you comment: # 18-01-2020: note: if you exclude mygrp$grp (i.e. use priori populations) then you get very nice structured plots, but that doesn't mean it is meaningful. I was a little confused at first because you use the flag '!showsimulated' when producing these plots... I would have thought 'simulated' meant the denovo clusters(?). Anyway Jombart & Collins (2015) in their adegent tutorial under the 'Interpreting group memberships' heading state 'Note that this is most useful for groups defined by an external criteria, i.e. defined biologically, as opposed to identified by k-means.' Maybe you could export heatmaps for both with and without prioir pop info probabilities like you do for the scatterplots? i.e. move the heatmap bit inside the 'showsimulated' for loop.

Another unrelated optional suggestion is to make the colours for the LEA structureplot match the correct pop colours when k=npops. I solved this for my data but havent made it generalised... let me know if this would be useful.

Thanks, Steph

mennodejong1986 commented 2 years ago

Hi Steph,

The wording 'showsimulated' was indeed confusing, or in fact simply wrong. For that reason, I recently (in version 1.07) changed the name 'showsimulated' into 'showinferred'. I also changed the DAPC plotnames from "WITH_prior_popinfo" and "WITHOUT_prior_popinfo" to "PRIOR_POPINFO" and "INFERRED_CLUSTERS" respectively. I hope this will remove the confusion.

'Inferred clusters' refers to clusters inferred by the find.clusters function (i.e., identified with K-means clustering), whereas 'prior popinfo' would refer to a priori defined population clustering (inds$pop column). The heatmaps ('dapc.inferred.vs.expectedclusters.pdf' file) are meant to show the congruence between the a priori defined clustering and the K-means inferred clustering. So the heatmap cannot be generated for either of them alone. Or do I misunderstand your question?

Regarding the usefullness of running the dapc function with a priori defined clusters (rather than K-means inferred clusters): I do not have sufficient understanding of DAPC to usefully comment on that. My comment from 18-01-2020 is meant as a warning that the input (e.g., a priori defined clustering) will guide the output.

In the new version 1.08, to be uploaded soon, SambaR will also create geographic maps visualizing the output of the find.structure function for various K.

Yes, I would be interested to know which commands you use to make the colours of the LEA structure plots corresponding for K=npops. Thanks for thinking along to improve SambaR.

Best,

Menno

Verzonden vanuit Outlookhttp://aka.ms/weboutlook


Van: StephanieTodd @.> Verzonden: vrijdag 1 juli 2022 1:24 Aan: mennodejong1986/SambaR @.> CC: mennodejong1986 @.>; Comment @.> Onderwerp: Re: [mennodejong1986/SambaR] optim.a.score n.sim arg (Issue #22)

Hi Menno, Great, thanks. I also noticed something else that may or not be something you want to change. Currently the DAPC assignment test heatmaps produced reflect assignment probabilities to de novo clusters (without prior pop info) vs captured pops, not assignment probability to a priori pops. I think this was intentional becasue you comment: # 18-01-2020: note: if you exclude mygrp$grp (i.e. use priori populations) then you get very nice structured plots, but that doesn't mean it is meaningful. I was a little confused at first because you use the flag '!showsimulated' when producing these plots... I would have thought 'simulated' meant the denovo clusters(?). Anyway Jombart & Collins (2015) in their adegent tutorial under the 'Interpreting group memberships' heading state 'Note that this is most useful for groups defined by an external criteria, i.e. defined biologically, as opposed to identified by k-means.' Maybe you could export heatmaps for both with and without prioir pop info probabilities like you do for the scatterplots? i.e. move the heatmap bit inside the 'showsimulated' for loop.

Another unrelated optional suggestion is to make the colours for the LEA structureplot match the correct pop colours when k=npops. I solved this for my data but havent made it generalised... let me know if this would be useful.

Thanks, Steph

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmennodejong1986%2FSambaR%2Fissues%2F22%23issuecomment-1171827082&data=05%7C01%7C%7Cb7b28df64d5a44cc9b7c08da5b007e21%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637922354914598334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HGdLi0ZDHOQxvPvaa4R0N4FtiVL88a9cuNEv%2Bkkf0I4%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAOGBCH7JS7K5UQN2WNPFR33VRZCGDANCNFSM52IG5NBQ&data=05%7C01%7C%7Cb7b28df64d5a44cc9b7c08da5b007e21%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637922354914598334%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QSWKZXGO9VHws7zMebGrTao3phsoeXKxVXIecTojRrY%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

StephanieTodd commented 2 years ago

Hi Menno, Thanks for releasing the v1.08 update, and sorry for the slow reply. In short I went down a rabbit hole with the DAPC assignment stuff, but in the end I decided to go with the 'without prior pop info' version as you originally had it (albeit with a different description in my methods).

In long, my understanding of DAPC is that it tries to separate groups along simplified 'component' axes, and the components that it generates to describe differences depend on what groups you give it to separate. If you give it your a priori populations as groups it will come up with components that best describe the differences between them, and if you give it inferred groups the components it comes up with will be different. You can then use the components to re-assign individuals to populations. This makes the most sense in the context where you have some new individuals that were not used to develop the component axes, but even with the same individuals reassignment to the prior pops is not 100%. E.g. with my data DAPC misassigns 8 individuals using prior pops and 15 using inferred clusters... so its possible to generate a heatmap showing the capture vs assigned pop for those 8.

Attached is a modified version of your LEAstructureplot function that makes the colours match the correct pops when K=npops.

Cheers, Steph LEAstructureplot_colourfix.txt