Committees: First Steps

OAGr commented 2 years ago

Eventually, it could be great to have strong "evaluation bodies" that could be forecasted upon.

One first step would just be to set up a few evaluation bodies, and have them make a few group evaluations.

If it's very cheap to add forecasting, it could be interesting to add forecasting as well, to see how well forecasters could predict these evaluations.

See:

https://www.lesswrong.com/posts/cLtdcxu9E4noRSons/part-1-amplifying-generalist-research-via-forecasting-models

uvafan commented 2 years ago

I like this idea.

It reminds me of a few ideas that have previously been thought about or tried, besides the one linked in the issue:

AI Forecasting Resolution Council, which shut down due to lack of demand (!)
Nuno's draft post on amplifying a central authority like Rootclaim, because the main concern is that you need a trusted/wise central authority.

My suggested first step would probably be talking to the people involved in (1) to understand what we could learn from it.

Then:

Decide which things will be evaluated in the first experiment. I'm not sure if you already had something in mind. I'd be excited about trying out either longtermist/EA organizations or AI safety/longtermist papers.
Reach out to people to form a committee for the first experiment, which should just be a few questions.
Learn from that one and maybe scale up.

Another thought I have here for which you might have cached thoughts or there might be prior work, is on how much people will actually agree more about evaluations of organizations, pieces of research, etc. vs. forecasts. I imagine it varies based on the type of thing being forecasted/evaluated. It could actually be interesting to do an experiment on this if one hasn't been done already. An intuition pump that people might not agree more is that I'd imagine observing MIRI's work over the past year wouldn't do much to change disagreements between proponents/detractors if they made forecasts a year ago.

OAGr commented 2 years ago

Thanks!

About the AI Forecasting Resolution Council: First, I'm familiar with it; I was one of the people who suggested it. Second, a major reason why it ended was just because Ben G and Jacob didn't stay active in the area. I could easily imagine a version of it with more continued effort behind it going much further.

For "what to evaluate", I like the idea of "big lists of similar things". Some ideas:

Relative value of different projects
Truth of different claims
Quality/innovation/novelty of each AI Safety Paper
Is argument/point X reasonable? (used for Tweets, comments, etc. Used to flag really bad/"pet" points)

Also, another idea, instead of an "evaluation committee of experts", would be something more like a survey or "deliberation democracy". Like, we randomly select a bunch of people who went to 2+ EAGs, put them in a panel, and use their responses.

OAGr commented 2 years ago

"how much people will actually agree more about evaluations of organizations, pieces of research, etc. vs. forecasts" And I agree that this is a big issue. I imagine it'll make some kind of a evaluations useless, but some will still be very valuable.

There are clearly some big cruxes in our community. If you are firm with one side, seeing an evaluation made by the other, or by a mix, might not help much. That said, if you could see evaluations made by those who agree with you on the big cruxes, that could still be great.

My hunch is that differences matter less than a lot of people would expect. It was very frequent with Guesstimate models that the answer would be pretty over-determined, so people who initially disagreed with each other would end up being totally fine with the answer.

uvafan commented 2 years ago

My hunch is that differences matter less than a lot of people would expect. It was very frequent with Guesstimate models that the answer would be pretty over-determined, so people who initially disagreed with each other would end up being totally fine with the answer.

I'd be very interested to learn more about your experiences and double crux on this, because my intuition is the opposite.

OAGr commented 2 years ago

Separate point: Instead of committees of experts, we could randomly sample people from the EA Forum or similar. I got a few votes for something similar here:

https://twitter.com/ozziegooen/status/1481454692882284545

OAGr commented 2 years ago

(Happy to discuss sometime)

quantified-uncertainty / potential-projects

Committees: First Steps #20