rddy / mimi

Code for the paper, "First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization"
MIT License
23 stars 2 forks source link

Questions #4

Open GreenWizard2015 opened 2 years ago

GreenWizard2015 commented 2 years ago

Hello.

First of all, thank you for the great articles and research. Currently, I'm working on software for people with disabilities and trying to apply your research in it. Unfortunately, I'm not very familiar with the area of mutual information estimation and I have a bunch of questions. I hope you provide some brief answers.

Best regards.

rddy commented 2 years ago

Happy to discuss further! Also happy to provide more hands-on help with coding or setting up experiments.

GreenWizard2015 commented 2 years ago
  • The code isn't super clear, but in this line we are actually reusing the same statistics network Tϕ as in this earlier line, so it's just 1 network rather than 1+n_mine_samp networks. n_mine_samp just refers to the number of samples we use to compute a Monte Carlo estimate of the expectation in Equation 2.

My bad, I thought that each call of build_model would create a unique network, so we would have 32 + 1 + 1 networks. It would be greater if you add a notice to the build_model and specify that it creates a unique MLP per scope.

  • Learning from samples collected with an old interface (i.e., off-policy RL) would be a bit difficult in this setting. The problem is that the state of the MDP actually includes the user's internal model of the interface, so when you go back and sample old transitions from a previous interface, you will only get partial observations that do not include this aspect of the state. I think you can address this partial observability by using a recurrent neural network architecture for the policy and value functions that takes a history of observations and commands as input (instead of only the most recent observation and command). You would also need to use importance sampling to correct for the non-stationary state distribution in the replay buffer (see Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning and DADS).

I meant reusing samples only for MI training. As you wrote in the article, we must collect data, train MI and then adapt the interface (it doesn’t matter whether we are using RL or another approach.). You can somehow perform the adaptation stage without completely new data (offline algorithms, re-labeling old data with new rewards, synthetic data, etc.). The main bottleneck is MI training. It requires the active participation of the user in order to gather data about the new interface. Thus, the problem of effective use of data for MI training arises. Theoretically, only the "intuitiveness" of the transition (s, a) -> s` is important to us, so any transitions can be used, even from old interfaces. Am I correct or there are some restrictions on the data for MI?

rddy commented 2 years ago

The problem is that the intuitiveness of the transition (s, a, s') cannot be evaluated in isolation. For example, an intuitive interface for scrolling on a mobile phone could either involve swiping up to scroll up or swiping down to scroll up, but some kind of mixture of the two interfaces would be unintuitive. That being said, it might be possible to speed up MI estimation through warm starts or meta-learning.

GreenWizard2015 commented 2 years ago

I completely agree and also thought about the problem of "mirror" interfaces. However, I believe that people tend to do things they are used to. For example, if it is more convenient for a person to swipe up, and the interface requires a swipe down, then the person will swipe down with a smaller amplitude or other differences. A person cannot give absolutely independent feedback for each interface, he will remember the previous one and try to control the new one in the same way. If we do not have fully discrete variables, then the difference should be observed. Moreover, your article is based precisely on this assumption, therefore, I think, it is possible to identify more convenient actions and not just the interface. However, this task is more difficult and therefore it is possible that it cannot be solved in practice (requires more resources).

Thank you for your responses and for making it clear that there is no fundamental reason not to try to reuse the data.

If possible, I would be grateful if you suggest articles, resources, etc. on the use of AI to improve accessibility for people with disabilities.

rddy commented 2 years ago

Here are some projects that I think are cool and have potential in this space: