Closed fjhheras closed 3 years ago
Hello @fjhheras! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Sorry for getting back to it so late. I had a weirdly configured email client and haven't been getting Github notifications last month.
Anyways, thanks for the PR, I really appreciate it! I think there were some issues with the methodology though. Coincidentally, I was teaching a stats course last semester (unfortunately, as per syllabus, I had to use R :P) where I implemented a paired permutation (randomization) test. As far as I know, the standard variant is just flipping the pairs. Below is my R reference implementation that matches the textbook results for a small toy dataset:
pairedSamples<-function(diff,nSim=1000000){
d <- diff
ld <- length(d)
obsmeand <- mean(d)
meandvec <- rep(NA,nSim)
for (i in 1:nSim){
rsigns <- 2 * rbinom(ld, 1, .5) - 1
newd <- rsigns * d
newmeand <- mean(newd)
meandvec[i] <- newmeand
}
pl <- sum(meandvec >= obsmeand)
pg <- sum(meandvec <= obsmeand)
pval <- ifelse(pl < pg, 2 * pl/nSim, 2 * pg/nSim)
options(scipen=1)
if (pval==0){
pval <- paste("p <",1/nSim)
} else {
pval <- paste("p =",pval)
}
options(scipen=0)
hhh <- hist(meandvec, plot=FALSE)
hist(meandvec, xlab="Mean of differences",
main=paste("Paired Sample Randomization Test,", pval),
probability=TRUE, xlim=range(c(hhh$breaks, obsmeand)),
breaks=100, col="tan")
abline(v=obsmeand, col="red")
mtext(paste(round(obsmeand, digits=4)), side=1, at=obsmeand)
}
nineteen80 = c(3.67, 1.72, 3.46, 2.60, 2.03, 2.10, 3.01)
nineteen90 = c(2.11, 1.79, 2.71, 1.89, 1.69, 1.71, 2.01)
For some reason, using the PR's code I couldn't reproduce these results. I think the problem were the arrays of different lengths. Tried to debug that but couldn't really figure out to fix it so I rewrote it using a more "naive" implementation (probably too many for-loops). However, it seems to work and reproduce the expected results now (added some unit tests).
Please let me know your feedback!
Well, that was embarrassing. My local version works fine, but somehow I forgot to add a np.where when doing the PR. My original implementation in the PR seems to output the right values after this change
indices_1 = np.where(rng.randn(n) > 0)[0]
instead of
indices_1 = rng.randn(n) > 0
In any case, the amended implementation you propose is much easier to follow. I checked times, and it is also a bit faster!
No worries, and thanks for double-checking! Wow, I am positively surprised that this is actually not that slow. So, maybe we should keep the current version then ;). I add the docs.
can you share full python code example pls how to use this new option
I think this is example 3 here: http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/#example-3-paired-two-sample-randomization-test
Code of Conduct
Description
I would like to add the option to perform permutation tests when the two populations are paired. http://axon.cs.byu.edu/Dan/478/assignments/permutation_test.php
The current pull request is just a first proposal. If you think this feature is useful enough to be implemented in mlxtend, I can improve documentation/style and introduce non-trivial tests (currently it is not well tested, so it can still be buggy).
Related issues or pull requests
Fixes #767
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)PYTHONPATH='.' pytest ./mlxtend -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend