harvard-lil / h2o

H2O is a web app for creating and reading open educational resources, primarily in the legal field
https://opencasebook.org
GNU Affero General Public License v3.0
36 stars 30 forks source link

Only index the most-recent casebook in a series in the casebook search #1973

Closed lizadaly closed 1 year ago

lizadaly commented 1 year ago

This drops the number of casebooks in the search index from 384 to 372 (-12).

It doesn't remove duplicates that aren't in a series; for example "Administrative Law 372" still appears twice on the initial page because it's not organized as such. I did a little experimenting with suppressing duplicates based on title + author only and that did seem to work, but it would fail to catch cases where the same casebook has different co-author credit between editions. Even though I think the Series feature is not fully baked, it's probably still better to affirmatively remove dupes by putting books in series rather than tweaking the search index further.

Before

image

After

image

(In this example, the Ball/Oberman casebook by Karlan is a clone, but not part of the series, and so that is not de-duped.)