pupil-labs / pupil

Open source eye tracking
https://pupil-labs.com
GNU Lesser General Public License v3.0
1.45k stars 673 forks source link

Recommended settings for highest accuracy (Accuracy Study) #1147

Closed behinger closed 1 year ago

behinger commented 6 years ago

We are currently planning an eye-tracking comparison study (concurrent Eyelink 1000 & Pupillabs recording). We will compare many different measures and paradigms. The current plan includes: Saccade & Fixation parameters (Amplitude, Velocity, Accuracy,...), Pupil Dilations, Smooth pursuit accuracy, Microsaccade detection (for completeness, obviously with 120Hz this will be difficult), robustness to headmovements and blink detection.

For this we would like to know, if our setting represents a typical stationary research use-case with emphasis on accuracy of data.

The limited bandwith of a single USB cable (the typical use case I assume) allows us to record with the following specs:

In order to relate eye movement parameters to the screen, we will use 16 Markers displayed on the corners of a 24" screen at a distance of 60cm.

We want to make all videos and data available to the community after we published our first preprint on the study. Capturing videos will also allow to posthoc use the 2D or 3D calibration algorithms of pupil labs.

Our questions:

cpicanco commented 6 years ago

How much would accuracy improve, if we would restrict ourselves to monocular recordings, but then with a higher resolution?

I don't know.

Regarding number of markers: we are planning to use 16 markers (choosing less in the end is always possible) -> should we make use of the screen-tracker plugin? https://github.com/cpicanco/player_plugins#how-to-use-the-screen_tracker_offline-plugin

The plugin you have mentioned is a modified version of the "offline surface tracker". It was intended to allow screen tracker without fiducial markers. I was studying attentional factors and was important for me to have, as much as possible, a clean field of view, especially without any black-white shapes (as is the case with fiducial markers). Should accuracy improve with a cleaner field of view, especially during calibration? I guess it should. However, I must say that I did not explicitly run any formal comparison.

Note that "tracking" is not explicitly related to "calibration methods". For example, to avoid exclude participants (maybe improving overall accuracy and increasing overall calibration time), I was using 15 calibration points and a participant-driven calibration method. May I ask if you have already decided what calibration method you will use?

What other things would you recommend to ensure highest data quality (besides the typical research settings with eye movements)?

This question really depends on your use case. For stationary research, you may consider using a chin-rest. Also, I have found 15 calibration points a lot more reliable than 9 or 5 points in both, 3d and 2d. At the end, If I had had enough time, I would test Single Marker Calibration.

cpicanco commented 6 years ago

Ps.: Nice blog @behinger :+1:

behinger commented 6 years ago

Hi! Thanks a lot for your answer.

I did not fully follow your argument with the black-white shapes. Do you think they will bias fixations towards the markers? I guess if the markers are on the screen at all time (also during calibration), this effect should be constant over all experimental conditions. I can see though, that attention experiments might be different.

Is there any official documentation on the accuracy of the eye tracker - especially different resolutions / speeds? The 2D vs. 3D detection algorithms and 2D vs 3D calibration algorithms can be tested offline (thanks to awesome pupillabs :)).

We will make use of a 13-point accuracy test, as it allows us concurrently calibrate the eyelink & pupillabs and I guess is pretty much standard in the field.

The single-marker calibration test is a bit harder to formalize - do you have experience there?

cpicanco commented 6 years ago

(Disclaimer: Please, note that I am an independent researcher, a "pupil enthusiast", I am not from pupil labs.)

I did not fully follow your argument with the black-white shapes. Do you think they will bias fixations towards the markers? I guess if the markers are on the screen at all time (also during calibration), this effect should be constant over all experimental conditions. I can see though, that attention experiments might be different.

Although equally spaced fiducial markers around the screen center do not generate any bias, it introduces noise and may interfere qualitatively with the experimental task. The plugin was modified simply to avoid such problems. Some human participants will allocate significant looking time at fiducial markers during the experimental session. Some of them will even try to "discover" the hidden logic about them, something that does not have anything to do with the experiment. Some will even move their heads and make strange sounds towards them. Some will even wrongly associate experimental stimuli with them, even if you explicitly instruct them to not do so.

Is there any official documentation on the accuracy of the eye tracker - especially different resolutions / speeds? The 2D vs. 3D detection algorithms and 2D vs 3D calibration algorithms can be tested offline (thanks to awesome pupillabs :)).

The official documentation has one simple trial right after a "Screen Marker Calibration". It has not measured accuracy over extended periods of time:

Technical note: https://arxiv.org/abs/1405.0006 Official site: https://pupil-labs.com/pupil/

There is an overall consensus in the community that 2d approach is best than 3d for accuracy in the long run (they say that you should expect a higher mean). However, it is not clear at all. From my experience, I have some specific cases with using a 3d setup that were far best than 2d. So I conclude that the present scenario is quite dependent on the operator expertise.

We will make use of a 13-point accuracy test, as it allows us concurrently calibrate the eyelink & pupillabs and I guess is pretty much standard in the field.

Well, you asked for "highest accuracy", not standards. :smile:

The single-marker calibration test is a bit harder to formalize - do you have experience there?

A little bit of experience. It really improved accuracy in the long run with myself wearing the headset. However, I did not test it with participants to develop any hands-on intuition.

cpicanco commented 6 years ago

@behinger

We will make use of a 13-point accuracy test, as it allows us concurrently calibrate the eyelink & pupillabs and I guess is pretty much standard in the field.

Also, pulling Pupil down to the current "standard" would not be a fair comparison (qualitatively speaking). Pupil is very flexible and has a lot more to offer than a "13-point 2d screen based calibration". I think that Readers of your future report would benefit from an "internal" comparison measuring all Pupil calibration methods. Since each calibration method was design to fit a specific use case, your comparison strategy should be adaptable to all of them. That way the people would have an additional resource to decide when and what calibration method to use. What do you think?

behinger commented 6 years ago

@cpicanco I hope I understood you correctly. For sure other calibration methods could be tested - but my intend is not to benchmark all of pupils calibration methods (I think pupillabs should do that) - my intend is to find out how good the pupillab eyetracker works in research settings. Of course one could add a single-marker-calibration test in addition (to have the data so to say) - I have to think about it.

cpicanco commented 6 years ago

@behinger I think you may be interested in a work I have done. I have checked if a pupil dev monocular is suficient for a common research setting in my field. In my case I didn't compare anything, I was just trying to check if it was feasible (world camera 30hz, eye camera 30hz). You can read the manuscript here: proof.pdf. In short, I concluded that pupil dev monocular is pretty much feasible as long as you run some accuracy corrections. The correction strategy we use required a constrained setup with equally spaced stimuli. We use Natural Features Calibration, a chin rest and a computer screen projected at the wall.

behinger commented 6 years ago

Hey! Thanks for your input. This is already very helpful.

I still wonder if someone from pupillabs can give an "official" recommendation?