Preprocessing should rely on final/complete files only

thartbm commented 6 years ago

The final/complete participant file is only created when the participant has done all the trials. Since the list of participants in the pre-processing window should only be populated with participants that have complete datasets. The simplest check for that is to see if their final/complete file exists.

There are several issues surrounding this:

in the end we don't want to keep the trial files around at all, but only when we're sure we have all the data, all the time do we remove them
it could be that the experiment definition has been changed at some point so that even when there is a final/complete file, it still does not work with the rest of the participants

thartbm commented 5 years ago

I've just taken our test dataset for pre-processing, and fudged it to see what would happen.

First, in the data folder I created an extra folder, and then I moved about half of the participants that were collected in there.

One: After opening the experiment in PyVMEC it showed the extra folder as a participant in the list. I can deselect it in the pre-processing, so I'm not sure if that is causing the later problem.

Two: Then, when I run pre-processing, and deselect the "folder" participant, it still doesn't run.

I'm not sure what causes the issues, but what we need is for people to be able to run the experiment on many different machines, copy the data over to 1 machine and then run the pre-processing. But they should also be able to remove a participant completely and then re-run the pre-processing as a single participant could bias the outlier removal procedure by a lot. Or let's say someone didn't complete the experiment but ran away before.

I'd say we want the incomplete participants detected when generating the list of participants in the main GUI. We might highlight these in some way.

Furthermore, the list of participants that is in the JSON should play no role whatsoever. Best thing would be to remove it, and generate these lists on the fly.

Then, when doing pre-processing, the list of participants available for pre-processing, should consist only of participants for which we have the complete experiment. And not just the final file, but it should actually by checked that they did the experiment that is in the current experiment definition: do all the task names, trial numbers, targets, rotations, feedback types etc. that are in their data match what is in the definition? This will prevent errors later on.

How to do this is open for discussion: you could take the current JSON and then check if the datafile matches it. For now, that might be the best solution because it works with the data we've already collected.

You could also store the JSON in the participant's data directory when you run them and compare that with the current JSON -- the relevant part, that is. That would make it better in the future, and we could reverse engineer it for the data that's already collected, by simply cop-pasting the JSON into the directory.

thartbm commented 5 years ago

Pre-processing should also call the function that generates a participant list and only include participants from list 0: those with the same JSON as the current experiment definition, AND with a complete dataset.

If there are participants in list 1 (with matching JSON, but incomplete data), there could be a pop-up with a warning?

juliusjgm12 commented 5 years ago

Pre-processing now only calls participants from list 0.

thartbm / PyVMEC

Preprocessing should rely on final/complete files only #64