jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

boot.ppfst #71

Closed Erikacj closed 7 months ago

Erikacj commented 7 months ago

I can run pairwise.WCfst on my dataset without issue but when I try to run:

boot <- boot.ppfst(data, nboot=100, quant=c(0.025,0.975))

I get the following error:

Error in apply(x, 2, fun2 <- function(x) sum(x > 0)) : dim(X) must have a positive length In addition: Warning messages: 1: In max(logical(0), na.rm = TRUE) : no non-missing arguments to max; returning -Inf 2: In max(logical(0), na.rm = TRUE) : no non-missing arguments to max; returning -Inf 3: In max(logical(0), na.rm = TRUE) : no non-missing arguments to max; returning -Inf 4: In min(logical(0), na.rm = TRUE) : no non-missing arguments to min; returning Inf 5: In max(logical(0), na.rm = TRUE) : no non-missing arguments to max; returning -Inf 6: In max(logical(0), na.rm = TRUE) : no non-missing arguments to max; returning -Inf 7: In names(data.al)[-c(1:2)] <- names(data)[-1] : number of items to replace is not a multiple of replacement length

This is what my dataset looks like:

'data.frame': 181 obs. of 344 variables: $ Strata: int 51 51 51 51 51 51 51 51 51 51 ... $ 1 : int 2 1 1 1 NA 1 1 2 1 1 ... $ 2 : int 1 2 1 1 NA 1 1 1 1 1 ... $ 3 : int 2 2 2 2 NA 2 2 2 2 2 ... $ 4 : int 1 2 1 1 NA 1 1 1 1 1 ... $ 5 : int 2 2 NA 2 2 1 1 2 1 2 ... $ 6 : int 1 1 NA 1 2 1 1 2 2 2 ... $ 7 : int 1 1 1 1 NA 1 2 1 1 1 ... $ 8 : int 1 1 1 1 NA 1 2 1 1 1 ... $ 9 : int 2 2 2 2 NA 2 2 2 2 2 ... $ 10 : int 2 1 2 2 NA 2 2 2 2 2 ... $ 11 : int 2 1 2 2 NA 2 2 2 2 2 ... $ 12 : int 2 1 2 2 NA 2 2 2 2 2 ... $ 13 : int 1 1 1 1 NA 2 2 1 2 1 ...

What am I doing wrong to not be able to get boot.ppfst to work?

jgx65 commented 7 months ago

Thanks @Erikacj . Could you provide an example data set reproducing the error? I am suspicious that you are using dosage data with a function that works with fstat format data. You could also check the closed issues about boot.ppfst.

Erikacj commented 7 months ago

Yes, @jgx65, you're correct. I was using dosage data - SNPs. What is the best way to find the population pairwise significance using dosage data? The boot.ppbetas function?

jgx65 commented 7 months ago

If you can assume your markers are independent, you could convert the dosage format to fstat format (by replacing 0 with 11, 1 with 12 and 2 with 22) and use boot.ppfst on this new dataset. If you can't assume independence, hierfstat does not have yet a specific function to carry out the test. I suggest you chop the genomes in blocks you assume are independent, calculate Fst for each of these blocks, and bootstrap them, as we do in Goudet & Weir 2023, Fig S4.

Erikacj commented 7 months ago

Hi Jerome. Thanks for the help. I was importing the vcf incorrectly before. It was reducing the data and made it look like dosage data. Now that I have multilocus genotypes coded correctly I can retrieve population pairwise Fst using pairwise.WCfst and test for population pairwise significance using boot.ppbetas. I just want to double check that I understand correctly, for some populations the output of boot.ppbetas is all negatives above and below diagonal, does this mean that these comparisons are insignificant?

On Mon, Apr 1, 2024 at 6:33 AM Jerome Goudet @.***> wrote:

If you can assume your markers are independent, you could convert the dosage format to fstat format (by replacing 0 with 11, 1 with 12 and 2 with 22) and use boot.ppfst on this new dataset. If you can't assume independence, hierfstat does not have yet a specific function to carry out the test. I suggest you chop the genomes in blocks you assume are independent, calculate Fst for each of these blocks, and bootstrap them, as we do in Goudet & Weir 2023, Fig S4 https://urldefense.com/v3/__https://journals.plos.org/plosgenetics/article?id=10.1371*journal.pgen.1010871__;Lw!!PvDODwlR4mBZyAb0!RqQ2rnzEOBZyU5tcb636zQ-PbVc7fykRZu4hX7AYWBoig2wfGQMv_ei8sON-um1D8IEkgzvMJNbOke2zseMTip1inw$ .

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/jgx65/hierfstat/issues/71*issuecomment-2030113288__;Iw!!PvDODwlR4mBZyAb0!RqQ2rnzEOBZyU5tcb636zQ-PbVc7fykRZu4hX7AYWBoig2wfGQMv_ei8sON-um1D8IEkgzvMJNbOke2zseMSiLPqaA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AC64DDKOSLWZQZG4DIWDJ4LY3GD3XAVCNFSM6AAAAABFNZCRNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZQGEYTGMRYHA__;!!PvDODwlR4mBZyAb0!RqQ2rnzEOBZyU5tcb636zQ-PbVc7fykRZu4hX7AYWBoig2wfGQMv_ei8sON-um1D8IEkgzvMJNbOke2zseMrrjuftA$ . You are receiving this because you were mentioned.Message ID: @.***>

jgx65 commented 7 months ago

If you are willing to assume random mating within populations and the loci are independent. You can also check if the results are consistent with hierfstat::boot.ppfst (where random mating within populations is not assumed)

Erikacj commented 7 months ago

Gotcha. Thank you!

On Wed, Apr 3, 2024 at 6:14 AM Jerome Goudet @.***> wrote:

If you are willing to assume random mating within populations and the loci are independent. You can also check if the results are consistent with hierfstat::boot.ppfst (where random mating within populations is not assumed)

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/jgx65/hierfstat/issues/71*issuecomment-2035028927__;Iw!!PvDODwlR4mBZyAb0!RsuN5KuoGIviavdYqAdy2BqFqJi3I8r0RDxNuuyBx-l7l2DU49nQS_pk9Z-bWzPhAwRqAGC3_I9tD_2gGDkETwp8Vw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AC64DDNVKJKMPSLDAOVVYA3Y3QTETAVCNFSM6AAAAABFNZCRNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGAZDQOJSG4__;!!PvDODwlR4mBZyAb0!RsuN5KuoGIviavdYqAdy2BqFqJi3I8r0RDxNuuyBx-l7l2DU49nQS_pk9Z-bWzPhAwRqAGC3_I9tD_2gGDk2rhWWmA$ . You are receiving this because you were mentioned.Message ID: @.***>