paulbricman / paulbricman.github.io

Source code for my website.
https://paulbricman.com
2 stars 0 forks source link

reflections/agency-harvesters #2

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

agency harvesters - Paul Bricman

https://paulbricman.com/reflections/agency-harvesters?utm_source=pocket_mylist

ivendrov commented 2 years ago

My understanding of modern recommender systems (having worked on a few at a BigTech) is that they hardly ever use RL, and if they do it's either over very short time horizons or in order to perform an off-policy correction for the training data, not nearly enough to learn complex strategic behaviors like "make the users more predictable". On the other hand, there are thousands of engineers, designers etc who analyze user behavior and will be promoted / given bonuses if they identify changes that increase platform usage. So it wouldn't be surprising if the recommender systems did have this behavior; but it's not (yet) learned by RL from a misaligned reward, it's learned by humans from misaligned organizational incentives.

paulbricman commented 2 years ago

Thanks for the insightful comment, that's somehow reassuring to learn :)