kohler / hotcrp

HotCRP conference review software
http://read.seas.harvard.edu/~kohler/hotcrp
Other
328 stars 111 forks source link

needed feature: topic rank #318

Closed dants closed 1 year ago

dants commented 1 year ago

Bidding is nowadays forbidden in some conferences, so, arguably, the number of topics -- which serve as the alternative -- further increases. Because authors often associate many topics with their submission, some conferences require from authors to specify in the submission form which is the best-matching topic, second best, and third best, with the goal of improving the match between submissions and reviewers by using this additional information. (Authors do this through a dropdown menu field whose options are all of the topics.)

A drawback of this ad-hoc method is that many authors misunderstand the instructions and, e.g., only select the best matching topic(s) in the aforementioned dropdown menu(s), neglecting to select them in the the topic checklist, which messes up HotCRP's topic score.

Another drawback is that the "prioritized" topic score must be computed externally to HotCRP.

It would therefore be helpful if HotCRP supported topic ranking natively, by allowing the administrator to require authors to rank the N best-matching topics among those they select, for some small possibly configureable N. This input can then be factored in the overall topic score (screenshot below), e.g, by using a smaller-than-2 denominator within the sqrt for best-matching topics.

Thanks, --Dan

image
kohler commented 1 year ago

Try using the "topic_max" configuration setting available in Advanced Settings, which lets you limit the number of topics a user may select.

dants commented 1 year ago

Limiting the number of topics will not work for us because we've learned that we get value -- in terms of improving the review assignment -- from the rest of the topics that are not top-ranking. We do not want to lose this value. We've learned this by manually inspecting and determining the correctness of many submission assignments whereby our new weighted average score (which gives much more weight to top-ranking topics but also factors the rest of the topics as suggested above) "disagreed" with hotcrp's existing weighted topic score (which doesn't account for top-ranking scores). "Disagreement" means one weighted average was positive whereas the other was negative.

The result was that in about 75% of the disagreements our new score formula (more weight to top-ranking topics) was correct in being negative/positive, but that in the remaining 25% hotcrp's existing score was correct. So we used both signals in our assignment process (see details below), which means PC members were requested to "bid" on high-scoring submissions associated with both kinds of scores.

@madanMus can you please specify some more details about this experiment?

The bottom line is that we could really benefit from the requested hotcrp functionality.

FYI, this year, in ASPLOS (and some other conferences that adopted our process, such as Eurosys'24), R1 review assignment is done as follows, involving 3 phases.

In Phase 1, for each PC member, we select a set S1 of (a few dozens of) submissions populated based on four per-member numeric signals, and this set is what PC members initially see as their "assigned reviews." The four signals are:

  1. TPMS scores, which approximate the match of each submission to the a set of self-selected, self-authored papers that the member uploaded to the TPMS system beforehand;
  2. hotcrp regular topic scores, which approximate the PC member's per-submission match based on their hotcrp-specified topics of interest;
  3. "prioritized topic scores," which, as noted above, extend hotcrp's topic score formula to factor the newly-added submission input that identifies the best-matching topic, the second-best, and the third best; and
  4. citation scores, which count the number of times each submission cites papers of the PC member. 

In Phase 2, PC members are requested to assign their expected "expertise score" to each of their S1 submissions. We use hotcrp's bidding interface for this purpose, but members are instructed to exclusively use the following scores:     2 = I may be able to provide a knowledgeable review.    1 = My review confidence will most likely be low.   -1 = I'm absolutely unable to review this submission. -999 = I believe I'm conflicted with this submission.

(We're treating all negative scores, except -999, as -1 and all scores >2 as 2.)

In Phase 3, we use these manually assigned predicted expertise scores as a 5th signal and produce a set S2 (subset of S1) that constitutes the R1 assignment of the PC members. To this end, we use a constraint solver that (i) utilizes all of the above information and the set of conflicts of interest, and (ii) optimizes also for fairness and such.

There's some indication that the above process is better than previous assignment methods. Data from the ASPLOS'24 spring cycle indicates that 92% of the spring submissions got at least two knowledgeable/expert reviews, and 66% got three (out of 4 R1 reviewers), and we have evidence that shows that this outcome is significantly better than is typical even in conferences that are much more homogeneous than ASPLOS in terms of research topics (in ATC'19 -- which employed regular bidding -- the corresponding numbers were significantly lower).

kohler commented 1 year ago

e5998c8afcc452955f5c2cb3a530fde54a04f9ef introduces “additional topic selector” submission fields. These fields draw from the same topic set as the intrinsic “Topics” field. You can control min and max for these fields separately (in JSON), and search them separately. For example you could introduce a “Primary Topic” field, and set min and max to 1 so each submission must select exactly one Primary Topic (radio buttons will be used).

The rest of this stuff—changing the topic_interest_score calculation or making it configurable; coding up your complex 5-phase assignment process—is out of scope for this issue. HotCRP allows you to perform bulk assignments using CSV files and to download basically all conference data in easy-to-automatically-process ways, and chairs with special needs should become comfortable with this functionality, as I believe you are already.

dants commented 1 year ago

Thanks, Eddie!

Given that there are dozens of topics, wouldn't the use of radio buttons for best-matching topic, second-best, and third best cause the form to be huge due to including the topic list four times?

If so, would you be willing to please consider supporting a dropdown menu topic selection field type?

kohler commented 1 year ago

I have implemented it the way that I think is better. Arrays of selectors are difficult for users to navigate, difficult to constrain (e.g., how to force ‘at most one primary topic’?), and would require more implementation to support search.