jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

bootstrapped post hoc CI #809

Closed NikitaKhromov-Borisov closed 4 months ago

NikitaKhromov-Borisov commented 4 years ago
tomtomme commented 8 months ago

@NikitaKhromov-Borisov Interesting. I can replicate this with your big data (n=4500). There is even a note below the table:

"Note.  Some confidence intervals could not be computed. Possibly too few bootstrap replicates."

This is with the default of 1000 bootstraps. Increasing to 5000 solves the problem. I cannot replicate this for smaller data sets like from the data library in combination with the 1000 default.

@JohnnyDoorn: Should we increase the default of 1000 bootstraps depending on n?

JohnnyDoorn commented 4 months ago

I think 1000 is fine, since we also provide an informative message when there are not enough replicates, with the suggestion to increase the number. Increasing the number means much slower analyses, so I would rather have the default at a lower value, and then have the user increase that number when they are sure about all the settings. Please reopen if you feel further discussion is needed.

Kind regards Johnny

NikitaKhromov-Borisov commented 4 months ago

No, no and NO! I need at least 10000. I cannot understand, why you cannot (or don't want) adapt fast algorithms?!

Best regards,

Dr. Nikita Khromov-Borisov

вт, 30 апр. 2024 г. в 00:45, Johnny van Doorn @.***>:

I think 1000 is fine, since we also provide an informative message when there are not enough replicates, with the suggestion to increase the number. Increasing the number means much slower analyses, so I would rather have the default at a lower value, and then have the user increase that number when they are sure about all the settings. Please reopen if you feel further discussion is needed.

Kind regards Johnny

— Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/809#issuecomment-2083737644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FSP732KSJBS3HVGZKNG3Y725RHAVCNFSM4OCECIW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGM3TGNZWGQ2A . You are receiving this because you were mentioned.Message ID: @.***>

tomtomme commented 4 months ago

@NikitaKhromov-Borisov AFAIK we depend on the algorithms R provides.

JohnnyDoorn commented 4 months ago

@NikitaKhromov-Borisov you are very free to specify 10,000 yourself, which takes 1 second to do. By default I prefer a lower number, so as to not immediately suck a user into waiting for 10 minutes when they tick the box. Secondly, and this has been explained before in a different issue, we use R which is fairly suboptimal for bootstrapping, but it is what it is at the moment, and there is no easy fix to magically speed up our bootstrap as long as we use R.

NikitaKhromov-Borisov commented 4 months ago

Please, see

https://rpubs.com/beniamino98/fastRob https://stackoverflow.com/questions/15978361/using-r-parallel-to-speed-up-bootstrap https://dereksonderegger.github.io/570L/15-speeding-up-r.html https://www.r-bloggers.com/2021/12/1000x-faster-wild-cluster-bootstrap-inference-in-r-with-fwildclusterboot-%F0%9F%9A%80/ https://stats.stackexchange.com/questions/437477/calculate-accelerated-bootstrap-interval-in-r https://www.biostars.org/p/313539/ https://rdrr.io/cran/FRB/man/MMboot_multireg.html

https://rdrr.io/cran/fbroc/man/fbroc.html https://rdrr.io/cran/fbroc/man/fbroc.html

Cheers Nikita

вт, 30 апр. 2024 г. в 12:46, Johnny van Doorn @.***>:

@NikitaKhromov-Borisov https://github.com/NikitaKhromov-Borisov you are very free to specify 10,000 yourself, which takes 1 second to do. By default I prefer a lower number, so as to not immediately suck a user into waiting for 10 minutes when they tick the box. Secondly, and this has been explained before in a different issue, we use R which is fairly suboptimal for bootstrapping, but it is what it is at the moment, and there is no easy fix to magically speed up our bootstrap as long as we use R.

— Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/809#issuecomment-2084853086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FSP5RZLPZDYLDVBFT6QDY75R57AVCNFSM4OCECIW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGQ4DKMZQHA3A . You are receiving this because you were mentioned.Message ID: @.***>

NikitaKhromov-Borisov commented 4 months ago

R bootstrap code from https://data-flair.training/blogs/bootstrapping-in-r/

Author DataFlair

library(boot)

Creating Function to obtain R-Squared from the data

r_squared <- function(formula, data, indices) { val <- data[indices,] # selecting sample with boot fit <- lm(formula, data=val) return(summary(fit)$r.square) }

Performing 150000 replications with boot

output <- boot(data=mtcars, statistic=r_squared, R=150000, formula=mpg~wt+disp)

Plotting the output

output plot(output)

Obtaining a confidence interval of 95%

boot.ci(output, type="bca")

Provides calculation for less than 2 minutes. And 13 minutes with 1 500 000 replications Regards, Nikita

вт, 30 апр. 2024 г. в 18:05, Nikita Khromov-Borisov < @.***>:

Please, see

https://rpubs.com/beniamino98/fastRob

https://stackoverflow.com/questions/15978361/using-r-parallel-to-speed-up-bootstrap https://dereksonderegger.github.io/570L/15-speeding-up-r.html

https://www.r-bloggers.com/2021/12/1000x-faster-wild-cluster-bootstrap-inference-in-r-with-fwildclusterboot-%F0%9F%9A%80/

https://stats.stackexchange.com/questions/437477/calculate-accelerated-bootstrap-interval-in-r https://www.biostars.org/p/313539/ https://rdrr.io/cran/FRB/man/MMboot_multireg.html

https://rdrr.io/cran/fbroc/man/fbroc.html https://rdrr.io/cran/fbroc/man/fbroc.html

Cheers Nikita

вт, 30 апр. 2024 г. в 12:46, Johnny van Doorn @.***>:

@NikitaKhromov-Borisov https://github.com/NikitaKhromov-Borisov you are very free to specify 10,000 yourself, which takes 1 second to do. By default I prefer a lower number, so as to not immediately suck a user into waiting for 10 minutes when they tick the box. Secondly, and this has been explained before in a different issue, we use R which is fairly suboptimal for bootstrapping, but it is what it is at the moment, and there is no easy fix to magically speed up our bootstrap as long as we use R.

— Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/809#issuecomment-2084853086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FSP5RZLPZDYLDVBFT6QDY75R57AVCNFSM4OCECIW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGQ4DKMZQHA3A . You are receiving this because you were mentioned.Message ID: @.***>

tomtomme commented 4 months ago

@JohnnyDoorn Is this something we could implement easily?

JohnnyDoorn commented 4 months ago

We already use the boot package Again, just because a user wants 100,000 bootstraps does not mean we have to make the default 100,000... this is why we have options, so you can tweak it yourself if you are not satisfied with the default. The speed of the bootstrap is something we can definitely look into at the moment, but for now there's not much we can change about it (and the current issue is about the default number of bootstraps)

EJWagenmakers commented 4 months ago

I guess the suggestion is that our implementation of the call to the package is somehow not efficient?!

JohnnyDoorn commented 4 months ago

Our call is highly similar to the one above, although that code uses a single lm function call and our analysis is a bit more complex:

NikitaKhromov-Borisov commented 4 months ago

Dear colleagues,

I found that bootstrap (BT) is slow in ANOVA modul. However, in correlation modul BT spend several seconds with 10 000 bootstraps (bts), less than 1 minute with 100 000 bts and about 3 minutes with 1 million bts. So 10 thousands bootstraps could be used as default. Cheers, Nikita

ср, 1 мая 2024 г. в 12:40, EJ @.***>:

I guess the suggestion is that our implementation of the call to the package is somehow not efficient?!

— Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/809#issuecomment-2088216798, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5FSP3MNAVNWWB3CIYRKP3ZAC2B5AVCNFSM4OCECIW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYHAZDCNRXHE4A . You are receiving this because you were mentioned.Message ID: @.***>

JohnnyDoorn commented 4 months ago

Dear @NikitaKhromov-Borisov,

Thanks for the suggestion - in the correlation we could indeed increase the default number of bootstrap samples in the correlation since it won't affect the speed as much. I do think that it's not possible to compare a correlation bootstrap to a ANOVA posthoc bootstrap, since those are simply quite different in terms of complexity and therefore speed.

Cheers, Johnny