Closed dniku closed 4 years ago
@dniku is the link you have attached the only one concerning the issue?
must not rely on keywords mentioned
@dniku why is this a bad practice? I believe it can spark a recall and give a hint of what to double-check before answering the question.
Temporarily removed the first question from the quiz, until I polish it.
why is this a bad practice?
Because it incentivizes answering the question superficially, without building a mental model of what is actually going on. Somewhat like the Chinese room. You learn to respond to STIMULUS 37 with RESPONSE 81, and you have no reason to understand either of them. Furthermore, if you are quoting lectures directly, like "what are the typical problems with X" you are disincentivized from building an understanding of the topic, as passing the question entail literally listing the things stated in the video, not thinking independently.
See also: https://www.lesswrong.com/posts/NMoLJuDJEms7Ku9XS/guessing-the-teacher-s-password
is the link you have attached the only one concerning the issue?
Likely, but not definitively. I would probably have posted a link to another thread if I encountered one, but I'm not certain I haven't forgotten to do so.
This is mostly fixed now. I'm preparing to include Pavel's post in the reading materials, and I have a note elsewhere about positive/negative feedback loops and state/action potentials.
I'd like to invite some discussion on this quiz and on how to make it better. By "better" I mean that the quiz must not rely on keywords mentioned in the video lecture.
There are actually two questions combined into one, which is somewhat confusing. Perhaps they could be decoupled?
Relevant discussion: https://www.coursera.org/learn/practical-rl/discussions/all/threads/kt2YjFf1EeiBFgqYZp-HyA
I think the answers should mention the distribution of rewards, not just a single value. Otherwise (in the standard general MDP setting) none of the answers are technically correct. Perhaps "It defines the distribution of values for each {state, (state, action) pair, (state, action, next_state) triple}"?
Sounds too much like referring to the video. Perhaps "which problems can you run into if you try to optimize return"?
Definitely a keyword from the video. For positive feedback loop, we could phrase it as "you may find a way to collect infinite reward even if that's not what you want to do". For negative feedback loop, I don't know (does it even make sense)?
I think it's better to replace those with a definition. Not sure how to phrase it succinctly though.
Perhaps the post by Pavel on the forum should be made part of the course (as reading material, for example).