Experimental settings - basic experiment

kasmith commented 7 years ago

This issue is for the discussion of the first experiment in which we demonstrate simulation & action change based on changes in costs of thinking (time) or experiments. We have a basic framework in the "experiment_development" branch with a number of moving parts that we need to decide on. These decisions come in two parts: structural questions and parameter settings.

Structural questions:

Should people continue to lose points while the experiment is running? They can still potentially run simulations during this time, but are more likely paying attention to the ball's motion
What types of trials should we use? Ones where the goals and walls can be all over the place (like the sample trials) or more constrained trials (e.g., simple blockers, red/green areas on one side)? Our physics model will probably be more precise with the constrained trials, but we will need to determine the right constraints
How do we want to construct the experimental trial distribution? There are two relevant probability distributions we will need to pay attention to: the proportion of actions that will be a success, and the average probability of success according to a noisy physics model. Do we have some sense of how these make the existing model predictions / actions vary?

Parameter settings:

Cost of time (points / s) and experiments for each experiment condition
Speed of the ball during simulation (and how long simulation can last)
Time between pushing the shot button and taking an action (currently 3s)
Loss of score when people fail the trial
Number of trials / range of simulation outcomes
Incentive pay ($X / point -- could be 0)

ishita-dg commented 7 years ago

Should people continue to lose points while the experiment is running? They can still potentially run simulations during this time, but are more likely paying attention to the ball's motion

Do you think it makes sense to not give people feedback in terms of being able to watch the ball move when they try and experiment, but rather just immediate feedback on whether or not it went into the goal? That takes care of people simulating as they watch the ball move, ensures they don't get any additional information from the experiment regarding gravity/friction etc (although I guess we assume these learning curves have already asymptoted), as well as maintains the most similarity to the "only experiments" control condition where they just "function learn" which positions work.

ishita-dg commented 7 years ago

I had a few other questions regarding the set-ups:

Is the reward structure negative if it hits red, zero if nothing and positive for green? Very high risk situations, like high negative for red etc, would also incentivize using more experiments, not just difficulty, right? Should we manipulate that to see if it increases experiments?
Does it ever happen that the probability of success for where to punt the ball is bimodal? Like two distinct ways to get to the goal? How would that affect the ES algorithm? Should we just avoid this complication?

kasmith commented 7 years ago

We could try giving just binary feedback about their experiments, but there will (a) still be time that they will need to receive feedback, and (b) I could see people getting very frustrated with not being able to see the outcome. But I do see the benefit of making the evidence more clean. We should discuss further in person.
Reward structure is positive for green, negative (with a small, fixed value) for everything else. The idea behind this is that we don't want people taking mostly random shots if time is about to run out. Of course, this is telling me that if points run down entirely we should have zero points (it's currently counted as a loss). I don't think we want to changing to this value in the first pass, but it could encourage having a higher confidence threshold.
Yes -- in fact one of the three examples is like that. Eric -- would this be a problem for the algorithm? If not, I suggest we try to keep trials like this in, but if so we can design stimuli around the bimodality

ericschulz commented 6 years ago

I think we could just increase the costs per second for the experiment; this way we also account for the fact that some experiments take longer than others. I agree that they can potentially always simulate and act at the same time, but I don't know how to avoid that. Thus, I'd rather make it a feature of the task from the start. Basically, what I'm saying is that I vote against binary feedback...;-)

I think it shouldn't influence the model too much if the distribution is binary but will have to check of course. I guess a good approach right now could be that we come up with interesting levels (maybe more than we would actually test) and then run the algorithm over those.

kasmith commented 6 years ago

Meeting notes from 10/11/17 discussion

Experiment settings / changes:

Only allow shots when the mouse is close to the ball (to avoid positioning mouse over the goal for straight shots)
Slower time decrements are needed -- perhaps 5/s for the expensive simulation condition, 2/s for cheap simulation
Don't just lose at the end, force a test (gives more information, avoids trade-offs between taking, not taking shot towards the end)

Stimulus creation:

Make simple levels with green goals flanked by red areas along the side of the table -- will reduce simulation travel times (thus variability in time to simulate) and potentially have more control over simulation uncertainty
These goal/flankers should vary in which wall they are on
Vary stimuli with three levels based on ground truth guessing -- many guesses will hit, some guesses will hit, and very few guesses will hit
Within those three levels, modify with simulation in two levels -- about the same proportion of simulations and ground truth will hit the goal, or many fewer simulations than ground truth

Model considerations:

GPs don't work well with a lot of oscillations in the function... try to avoid these trials?
Modeling continuous time -- break simulation down into discrete simulation chunks (that take variable time per participant) to get a discrete sequence of simulations, experiments, and the decision
Modeling the decision to test -- adds more complexity in the decision process. Heuristic process (e.g., once confidence for a win exceeds a bound)? Greedy updated based on expected gain from sim / exp?

Model API consists of three functions:

ground_truth_simulation(angle, trial): Outputs binary success value, travel distance (in px)

noisy_simulation(angle, trial, noise_parameters): Outputs binary success value, simulation travel distance (px)

multi_noisy_simulation(angle, trial, noise_parameters, n): Outputs probability of success over n trials, average simulation travel distance (should we split by success/failure?)

Next steps:

KS to design sample stimuli & API for ES to play with (try by end of week)
ES to start to design the model -- play around to see how much information from an individual we might need (for number of trials)
KS to update experiment, start building level generation logic
We should get Sam & Josh to weigh in once we have some sample levels and some knowledge of how the model is working out -- don't want to get too far into this in case there are changes suggested

Whiteboard picture:

kasmith commented 6 years ago

I've updated the experiment to account for the changes we discussed last week, including:

1) The score now decreases at 6/s instead of 10/s -- this seems much more reasonable 2) You cannot launch the ball (for either the experiment or action) unless your mouse is within 250px of the ball center -- this is to avoid hovering over a goal 3) Once the points run down, it is no longer an automatic loss; instead you get one chance to shoot the ball to avoid losing points. To make this clear to participants, there is now a gold outline around the table whenever the actual shot is active

ishita-dg / SimulationVSAction

Experimental settings - basic experiment #1