Open utterances-bot opened 2 years ago
Assumption (in the implicit general solution of use other model + red teaming) that there is a way to detect exploitation before the model takes some dramatic and problematic action. And that interpolation is faster than optimization, here too.
What if Alex games the expanding charter?
Description placeholder
https://paulbricman.com/hypothesis-subspace/?stackedPages=%2Fdeontic-arrays&stackedPages=%2Fhow-could-deontic-arrays-help-avoid-hfdt-takeover&stackedPages=%2Fwhat-if-alex-games-the-expanding-charter