Closed NikitaKhromov-Borisov closed 4 months ago
@NikitaKhromov-Borisov Interesting. I can replicate this with your big data (n=4500). There is even a note below the table:
"Note. Some confidence intervals could not be computed. Possibly too few bootstrap replicates."
This is with the default of 1000 bootstraps. Increasing to 5000 solves the problem. I cannot replicate this for smaller data sets like from the data library in combination with the 1000 default.
@JohnnyDoorn: Should we increase the default of 1000 bootstraps depending on n?
I think 1000 is fine, since we also provide an informative message when there are not enough replicates, with the suggestion to increase the number. Increasing the number means much slower analyses, so I would rather have the default at a lower value, and then have the user increase that number when they are sure about all the settings. Please reopen if you feel further discussion is needed.
Kind regards Johnny
No, no and NO! I need at least 10000. I cannot understand, why you cannot (or don't want) adapt fast algorithms?!
Best regards,
Dr. Nikita Khromov-Borisov
вт, 30 апр. 2024 г. в 00:45, Johnny van Doorn @.***>:
I think 1000 is fine, since we also provide an informative message when there are not enough replicates, with the suggestion to increase the number. Increasing the number means much slower analyses, so I would rather have the default at a lower value, and then have the user increase that number when they are sure about all the settings. Please reopen if you feel further discussion is needed.
Kind regards Johnny
— Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/809#issuecomment-2083737644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FSP732KSJBS3HVGZKNG3Y725RHAVCNFSM4OCECIW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGM3TGNZWGQ2A . You are receiving this because you were mentioned.Message ID: @.***>
@NikitaKhromov-Borisov AFAIK we depend on the algorithms R provides.
@NikitaKhromov-Borisov you are very free to specify 10,000 yourself, which takes 1 second to do. By default I prefer a lower number, so as to not immediately suck a user into waiting for 10 minutes when they tick the box. Secondly, and this has been explained before in a different issue, we use R which is fairly suboptimal for bootstrapping, but it is what it is at the moment, and there is no easy fix to magically speed up our bootstrap as long as we use R.
Please, see
https://rpubs.com/beniamino98/fastRob https://stackoverflow.com/questions/15978361/using-r-parallel-to-speed-up-bootstrap https://dereksonderegger.github.io/570L/15-speeding-up-r.html https://www.r-bloggers.com/2021/12/1000x-faster-wild-cluster-bootstrap-inference-in-r-with-fwildclusterboot-%F0%9F%9A%80/ https://stats.stackexchange.com/questions/437477/calculate-accelerated-bootstrap-interval-in-r https://www.biostars.org/p/313539/ https://rdrr.io/cran/FRB/man/MMboot_multireg.html
https://rdrr.io/cran/fbroc/man/fbroc.html https://rdrr.io/cran/fbroc/man/fbroc.html
Cheers Nikita
вт, 30 апр. 2024 г. в 12:46, Johnny van Doorn @.***>:
@NikitaKhromov-Borisov https://github.com/NikitaKhromov-Borisov you are very free to specify 10,000 yourself, which takes 1 second to do. By default I prefer a lower number, so as to not immediately suck a user into waiting for 10 minutes when they tick the box. Secondly, and this has been explained before in a different issue, we use R which is fairly suboptimal for bootstrapping, but it is what it is at the moment, and there is no easy fix to magically speed up our bootstrap as long as we use R.
— Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/809#issuecomment-2084853086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FSP5RZLPZDYLDVBFT6QDY75R57AVCNFSM4OCECIW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGQ4DKMZQHA3A . You are receiving this because you were mentioned.Message ID: @.***>
R bootstrap code from https://data-flair.training/blogs/bootstrapping-in-r/
library(boot)
r_squared <- function(formula, data, indices) { val <- data[indices,] # selecting sample with boot fit <- lm(formula, data=val) return(summary(fit)$r.square) }
output <- boot(data=mtcars, statistic=r_squared, R=150000, formula=mpg~wt+disp)
output plot(output)
boot.ci(output, type="bca")
Provides calculation for less than 2 minutes. And 13 minutes with 1 500 000 replications Regards, Nikita
вт, 30 апр. 2024 г. в 18:05, Nikita Khromov-Borisov < @.***>:
Please, see
https://rpubs.com/beniamino98/fastRob
https://stackoverflow.com/questions/15978361/using-r-parallel-to-speed-up-bootstrap https://dereksonderegger.github.io/570L/15-speeding-up-r.html
https://stats.stackexchange.com/questions/437477/calculate-accelerated-bootstrap-interval-in-r https://www.biostars.org/p/313539/ https://rdrr.io/cran/FRB/man/MMboot_multireg.html
https://rdrr.io/cran/fbroc/man/fbroc.html https://rdrr.io/cran/fbroc/man/fbroc.html
Cheers Nikita
вт, 30 апр. 2024 г. в 12:46, Johnny van Doorn @.***>:
@NikitaKhromov-Borisov https://github.com/NikitaKhromov-Borisov you are very free to specify 10,000 yourself, which takes 1 second to do. By default I prefer a lower number, so as to not immediately suck a user into waiting for 10 minutes when they tick the box. Secondly, and this has been explained before in a different issue, we use R which is fairly suboptimal for bootstrapping, but it is what it is at the moment, and there is no easy fix to magically speed up our bootstrap as long as we use R.
— Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/809#issuecomment-2084853086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FSP5RZLPZDYLDVBFT6QDY75R57AVCNFSM4OCECIW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGQ4DKMZQHA3A . You are receiving this because you were mentioned.Message ID: @.***>
@JohnnyDoorn Is this something we could implement easily?
We already use the boot package Again, just because a user wants 100,000 bootstraps does not mean we have to make the default 100,000... this is why we have options, so you can tweak it yourself if you are not satisfied with the default. The speed of the bootstrap is something we can definitely look into at the moment, but for now there's not much we can change about it (and the current issue is about the default number of bootstraps)
I guess the suggestion is that our implementation of the call to the package is somehow not efficient?!
Our call is highly similar to the one above, although that code uses a single lm function call and our analysis is a bit more complex:
Dear colleagues,
I found that bootstrap (BT) is slow in ANOVA modul. However, in correlation modul BT spend several seconds with 10 000 bootstraps (bts), less than 1 minute with 100 000 bts and about 3 minutes with 1 million bts. So 10 thousands bootstraps could be used as default. Cheers, Nikita
ср, 1 мая 2024 г. в 12:40, EJ @.***>:
I guess the suggestion is that our implementation of the call to the package is somehow not efficient?!
— Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/809#issuecomment-2088216798, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FSP3MNAVNWWB3CIYRKP3ZAC2B5AVCNFSM4OCECIW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYHAZDCNRXHE4A . You are receiving this because you were mentioned.Message ID: @.***>
Dear @NikitaKhromov-Borisov,
Thanks for the suggestion - in the correlation we could indeed increase the default number of bootstrap samples in the correlation since it won't affect the speed as much. I do think that it's not possible to compare a correlation bootstrap to a ANOVA posthoc bootstrap, since those are simply quite different in terms of complexity and therefore speed.
Cheers, Johnny
Steps to reproduce: