Filtering Data Changes Bayesian Outputs

lindsayas22 commented 5 years ago

JASP version: 9.2 and 9.0.1
OS name and version: Windows 10
Analysis: Bayesian RM ANOVA
Bug description: When I filter data using the filter icon button, I get one Bayes output. When I filter data by clicking on the variable name and then X-ing out all of the unwanted data, I get a completely different output with an way higher Bayes factor that drastically changes each time I run it.
Add JASP files as a zip: I feel like I shouldn't share my data...
Screenshot:
Expected behaviour: The same output for both filtering methods/consistency with the X-ing method.
Steps to reproduce: (using made up data)
1. Go to 'Data'
2. Click on 'the filter icon'
3. From the variable list, drag a [Variable X] into the filter box
4. Type '= [Variable condition Y]' Click 'Apply Filter'
5. See RM Bayes output
6. Drag filter to the trash icon
7. Click 'Apply Filter' and exit out of the filter box
8. Click the '[Variable X]' in the data (to bring up other filter box)
9. Check to remove all other Variable conditions, besides [Y]
10. See completely different Bayes output
11. Check and uncheck the other Variable X values multiple times and see a different output each time only 'Variable condition Y]' is checked

lindsayas22 commented 5 years ago

Is anyone going to respond to this? I need to know if my results are correct...

EJWagenmakers commented 5 years ago

Hi Santalin,

Sorry for the delay in responding. Now there should be some change in the results when you run the model again (because the result involves some numerical methods). But that variability is indicated by the % error, and you should not get an order of magnitude difference. Some suggestions:

Have you confirmed that the different methods of filtering result in the same data? (e.g., using descriptives)
Maybe you can give us a .jasp file with fake data? Would make it easier to get to the bottom of this. @JohnnyDoorn @vandenman Cheers, E.J.

lindsayas22 commented 5 years ago

Hello,

They are always vastly different, sometimes a difference of 100+. At first, I though this was just happening with the checking/X-ing method, but now I'm noticing some difference using the filtering method as well. Some of my conditions have always had a Bayes factor of around 45, but now they're all of a sudden around 30 and I'm not sure how. I'm not sure what my actual results are anymore.

I have attached the JASP file with made up data. I am interested in looking within each Experiment and finding the Bayes factor with the null model including main effects of Number and Letter (already done in my Bayesian statistics). Let me know if you can figure anything out! Made up Data for JASP.zip

EJWagenmakers commented 5 years ago

@vandenman @JohnnyDoorn @Kucharssim Could you take a look and help out? Perhaps there is something amiss with the filtering functionality, in which case it should be fixed before the new version.

JohnnyDoorn commented 5 years ago

It seems that using the column filter results are correct - I just verified this with 0.9.3. See the attached screenshot for the analyis of group "C" in the experiment condition. The filter constructor seemed to have a parsing problem - if you reload the analysis with the filtered data and increase the number of samples it should work also with the filter constructor (for me that works at least, on 0.9.2). filterBuilderRMBANOVA

Kind regards Johnny

Kucharssim commented 5 years ago

Hello @santalin22,

If you run the same analysis but loading in a csv that has only that one condition (Experiment = C), you can verify that the results @JohnnyDoorn posted are correct. However, I cannot exactly reproduce the behavior you describe (i.e., completely different results each time you rerun the analysis - the results somewhat fluctuate but that is to be expected under the error %).

However, I can reproduce a strange behavior - both on 0.9.2 and on the upcoming 0.9.3 - when you apply the drag and drop filter while the analysis already runs. In that case, it seems to output results for the filter that was active when the analysis initiated. If you rerun everything again though (i.e., clicking on Results -> Refresh All), then the output is correct as well. You can try to verify whether this is what caused your troubles.

I am tagging @JorisGoosen @vandenman who know about filters and ANOVAs more. Is it possible that the filter constructor does not initiate new analysis if one is already running?

Best, Simon

JorisGoosen commented 5 years ago

@Kucharssim your analysis sounds correct, I will take a look at it.

lindsayas22 commented 5 years ago

Wow, yes, that was totally it! It wasn't happening with the drag and drop filtering method, but with the checking/X-ing the boxes for the different conditions. You can't uncheck a box without checking another one first and you don't click an "apply filter" button before the analyses begin like with the drag and drop method. So then it must have been performing the analysis on both checked conditions before I X-ed the other one out. And since I don't always check/uncheck the boxes in the same order, that would explain why the results were vastly different each time I did it. I refreshed the results (didn't know I could do that before) and now the values are correct. Thank you for the help!

AlexanderLyNL commented 5 years ago

Awesome, I consider this clarified. Please reopen if something's still unclear.

EJWagenmakers commented 5 years ago

It's clarified but it isn't fixed, right?

JorisGoosen commented 5 years ago

Exactly, we should fix this for the 0.9.3 release.

lindsayas22 commented 5 years ago

I understand what was happening now and how to make it work, but perhaps the future version could make you "apply filter" before running the analyses or continuously update as you check/uncheck conditions? This is only happening with the Bayesian statistics, not the other stats I've been running (regular ANOVAs, t-tests, descriptives, etc.), that I've seen. So when I ran Bayesian stats and had the "Descriptives" box checked, the descriptives would still say N = 40, as if it were correctly analyzing the one condition I had checked, but the Bayesian analyses were based off of the two conditions that were checked before I unchecked the other.

JorisGoosen commented 5 years ago

Yeah I will :)

I'm pretty sure I thought of rerunning an analysis that was active when a filter is changed, but maybe there is something specific in the Bayesian one that breaks it. It might just be that it is the only one that takes long enough to actually get into the situation of having an analysis running and a filter changed simultaneously.

JohnnyDoorn commented 5 years ago

Yes I think it's definitely the computation time that makes it a problem for some Bayesian analyses and not others. We could make it have the same impact as changing analysis options (where it restarts the analysis/sampling if you change the number of samples, for instance).

On Wed, 24 Apr 2019 at 17:38, Joris Goosen notifications@github.com wrote:

Yeah I will :)

I'm pretty sure I thought of rerunning an analysis that was active when a filter is changed, but maybe there is something specific in the Bayesian one that breaks it. It might just be that it is the only one that takes long enough to actually get into the situation of having an analysis running and a filter changed simultaneously.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-issues/issues/338#issuecomment-486295641, or mute the thread https://github.com/notifications/unsubscribe-auth/ADX2BCY4UKF7R64SAFG4ZJ3PSB5IJANCNFSM4HD7C27A .

jasp-stats / jasp-issues

Filtering Data Changes Bayesian Outputs #338