ishita-dg / SimulationVSAction

Code base for physics simulation experiment + Marco et al modeling framework
1 stars 0 forks source link

Experimental settings - basic experiment #1

Open kasmith opened 7 years ago

kasmith commented 7 years ago

This issue is for the discussion of the first experiment in which we demonstrate simulation & action change based on changes in costs of thinking (time) or experiments. We have a basic framework in the "experiment_development" branch with a number of moving parts that we need to decide on. These decisions come in two parts: structural questions and parameter settings.

Structural questions:

  1. Should people continue to lose points while the experiment is running? They can still potentially run simulations during this time, but are more likely paying attention to the ball's motion
  2. What types of trials should we use? Ones where the goals and walls can be all over the place (like the sample trials) or more constrained trials (e.g., simple blockers, red/green areas on one side)? Our physics model will probably be more precise with the constrained trials, but we will need to determine the right constraints
  3. How do we want to construct the experimental trial distribution? There are two relevant probability distributions we will need to pay attention to: the proportion of actions that will be a success, and the average probability of success according to a noisy physics model. Do we have some sense of how these make the existing model predictions / actions vary?

Parameter settings:

  1. Cost of time (points / s) and experiments for each experiment condition
  2. Speed of the ball during simulation (and how long simulation can last)
  3. Time between pushing the shot button and taking an action (currently 3s)
  4. Loss of score when people fail the trial
  5. Number of trials / range of simulation outcomes
  6. Incentive pay ($X / point -- could be 0)
ishita-dg commented 7 years ago
  1. Should people continue to lose points while the experiment is running? They can still potentially run simulations during this time, but are more likely paying attention to the ball's motion

Do you think it makes sense to not give people feedback in terms of being able to watch the ball move when they try and experiment, but rather just immediate feedback on whether or not it went into the goal? That takes care of people simulating as they watch the ball move, ensures they don't get any additional information from the experiment regarding gravity/friction etc (although I guess we assume these learning curves have already asymptoted), as well as maintains the most similarity to the "only experiments" control condition where they just "function learn" which positions work.

ishita-dg commented 7 years ago

I had a few other questions regarding the set-ups:

  1. Is the reward structure negative if it hits red, zero if nothing and positive for green? Very high risk situations, like high negative for red etc, would also incentivize using more experiments, not just difficulty, right? Should we manipulate that to see if it increases experiments?
  2. Does it ever happen that the probability of success for where to punt the ball is bimodal? Like two distinct ways to get to the goal? How would that affect the ES algorithm? Should we just avoid this complication?
kasmith commented 7 years ago
  1. We could try giving just binary feedback about their experiments, but there will (a) still be time that they will need to receive feedback, and (b) I could see people getting very frustrated with not being able to see the outcome. But I do see the benefit of making the evidence more clean. We should discuss further in person.
  2. Reward structure is positive for green, negative (with a small, fixed value) for everything else. The idea behind this is that we don't want people taking mostly random shots if time is about to run out. Of course, this is telling me that if points run down entirely we should have zero points (it's currently counted as a loss). I don't think we want to changing to this value in the first pass, but it could encourage having a higher confidence threshold.
  3. Yes -- in fact one of the three examples is like that. Eric -- would this be a problem for the algorithm? If not, I suggest we try to keep trials like this in, but if so we can design stimuli around the bimodality
ericschulz commented 6 years ago

I think we could just increase the costs per second for the experiment; this way we also account for the fact that some experiments take longer than others. I agree that they can potentially always simulate and act at the same time, but I don't know how to avoid that. Thus, I'd rather make it a feature of the task from the start. Basically, what I'm saying is that I vote against binary feedback...;-)

I think it shouldn't influence the model too much if the distribution is binary but will have to check of course. I guess a good approach right now could be that we come up with interesting levels (maybe more than we would actually test) and then run the algorithm over those.

kasmith commented 6 years ago

Meeting notes from 10/11/17 discussion

Experiment settings / changes:

Stimulus creation:

Model considerations:

Model API consists of three functions:

ground_truth_simulation(angle, trial): Outputs binary success value, travel distance (in px)

noisy_simulation(angle, trial, noise_parameters): Outputs binary success value, simulation travel distance (px)

multi_noisy_simulation(angle, trial, noise_parameters, n): Outputs probability of success over n trials, average simulation travel distance (should we split by success/failure?)

Next steps:

Whiteboard picture:

image

kasmith commented 6 years ago

I've updated the experiment to account for the changes we discussed last week, including:

1) The score now decreases at 6/s instead of 10/s -- this seems much more reasonable 2) You cannot launch the ball (for either the experiment or action) unless your mouse is within 250px of the ball center -- this is to avoid hovering over a goal 3) Once the points run down, it is no longer an automatic loss; instead you get one chance to shoot the ball to avoid losing points. To make this clear to participants, there is now a gold outline around the table whenever the actual shot is active