coursera/week2 reward design quiz

dniku commented 5 years ago

I'd like to invite some discussion on this quiz and on how to make it better. By "better" I mean that the quiz must not rely on keywords mentioned in the video lecture.

What is the function of reward?

There are actually two questions combined into one, which is somewhat confusing. Perhaps they could be decoupled?

Relevant discussion: https://www.coursera.org/learn/practical-rl/discussions/all/threads/kt2YjFf1EeiBFgqYZp-HyA

It defines the value of each ...

I think the answers should mention the distribution of rewards, not just a single value. Otherwise (in the standard general MDP setting) none of the answers are technically correct. Perhaps "It defines the distribution of values for each {state, (state, action) pair, (state, action, next_state) triple}"?

What are the typical problems with optimization of return?

Sounds too much like referring to the video. Perhaps "which problems can you run into if you try to optimize return"?

Positive/negative feedback loop

Definitely a keyword from the video. For positive feedback loop, we could phrase it as "you may find a way to collect infinite reward even if that's not what you want to do". For negative feedback loop, I don't know (does it even make sense)?

state/action-potential

I think it's better to replace those with a definition. Not sure how to phrase it succinctly though.

reduces the variance of the return estimator by decreasing the contribution of distant rewards

Perhaps the post by Pavel on the forum should be made part of the course (as reading material, for example).

pshvechikov commented 5 years ago

@dniku is the link you have attached the only one concerning the issue?

pshvechikov commented 5 years ago

must not rely on keywords mentioned

@dniku why is this a bad practice? I believe it can spark a recall and give a hint of what to double-check before answering the question.

pshvechikov commented 5 years ago

Temporarily removed the first question from the quiz, until I polish it.

dniku commented 5 years ago

why is this a bad practice?

Because it incentivizes answering the question superficially, without building a mental model of what is actually going on. Somewhat like the Chinese room. You learn to respond to STIMULUS 37 with RESPONSE 81, and you have no reason to understand either of them. Furthermore, if you are quoting lectures directly, like "what are the typical problems with X" you are disincentivized from building an understanding of the topic, as passing the question entail literally listing the things stated in the video, not thinking independently.

is the link you have attached the only one concerning the issue?

Likely, but not definitively. I would probably have posted a link to another thread if I encountered one, but I'm not certain I haven't forgotten to do so.

dniku commented 4 years ago

This is mostly fixed now. I'm preparing to include Pavel's post in the reading materials, and I have a note elsewhere about positive/negative feedback loops and state/action potentials.

yandexdataschool / Practical_RL

coursera/week2 reward design quiz #189