hypothesis / h

Annotate with anyone, anywhere.
https://hypothes.is/
BSD 2-Clause "Simplified" License
2.96k stars 427 forks source link

Too-large URI gives error: "Saving annotation failed 500" #5712

Open mkdir-washington-edu opened 5 years ago

mkdir-washington-edu commented 5 years ago

https://hypothesis.zendesk.com/agent/tickets/6123 When the user went to save an annotation, they received the error "Saving annotation failed 500" (see pic). The normalized URI was too large to be indexed in the DB.

Sentry issue: https://sentry.io/organizations/hypothesis/issues/1214722226/events/eb073c04ffb044a4a632fa59580945c2/?project=37293&statsPeriod=24h#

Screen-Shot-2019-09-20-at-10 10 18-AM

mkdir-washington-edu commented 5 years ago

Maybe related to: https://hypothes-is.slack.com/archives/C0LUWQQJJ/p1548872758005100

hmstepanek commented 5 years ago

Also see slack convo.

Brief root cause:

Brief root cause of this issue was it was due to the claimant normalized uri being too large to index in the db and so the annotation failed to be written to the db. This appears to have happened 226 times to 8 users over the last week.

Possible solutions:

  1. In a previous discussion of this same issue earlier this year I recommended at the very least this issue should be caught and handled before it makes it to the database write.
  2. We may also be able to shrink the uri or cut it off in cases like this where the uri is simply too long.
mkdir-washington-edu commented 5 years ago

Additional requesters: https://hypothesis.zendesk.com/agent/tickets/6246

lyzadanger commented 5 years ago

I'd like to propose a third "possible solution": change the way we're indexing this uniqueness constraint. Instead of enforcing unique on (claimant_normalized, type) as we do at present (https://github.com/hypothesis/h/blob/f41b7bfb7153006fd094bb0be4bdd2e6d5b28f95/h/models/document.py#L148), we might consider enforcing unique on ([some hashed representation of] claimant_normalized, type). This would fix the problem without having to mutate data.

Some nice discussion here: https://dba.stackexchange.com/a/25140

robertknight commented 5 years ago

Please see my analysis on the duplicate issue https://github.com/hypothesis/h/issues/5713#issuecomment-535542175. I think there are some other steps we can take first which will solve the problem more usefully for users.