hypothesis / h

Annotate with anyone, anywhere.
https://hypothes.is/
BSD 2-Clause "Simplified" License
2.94k stars 427 forks source link

Annotations exist in Postgres but missing from Elasticsearch #5227

Open judell opened 6 years ago

judell commented 6 years ago

Here is what I can see in metabase:

image

There are 7 records, 5 of which are deleted. Of the remaining two, one is in Public and the other in group oRNgpGLj.

Here's the badge query and its result:

GET /api/badge?uri=https%3A%2F%2Fwww.npr.org%2F2018%2F07%2F29%2F632994245%2Fzimbabwes-rickety-trains-get-a-boost-from-expat-investors-next-door

{"total": 4}

The client query (for Public) and its result:

GET /api/search?_separate_replies=true&group=world&limit=200&offset=0&order=asc&sort=created&uri=https%3A%2F%2Fwww.npr.org%2F2018%2F07%2F29%2F632994245%2Fzimbabwes-rickety-trains-get-a-boost-from-expat-investors-next-door

{"rows": [], "total": 4, "replies": []}

The client query (for the group) and its result:

GET /api/search?_separate_replies=true&group=oRNgpGLj&limit=200&offset=0&order=asc&sort=created&uri=https%3A%2F%2Fwww.npr.org%2F2018%2F07%2F29%2F632994245%2Fzimbabwes-rickety-trains-get-a-boost-from-expat-investors-next-door

{"rows": [], "total": 0, "replies": []}

Note: This is an URL for which I have, in the course of testing an app for a partner, repeatedly created and deleted annotations.

robertknight commented 6 years ago

I flagged this to look into a while back but only just got around to looking into it. I took a dump of all annotation IDs in Elasticsearch and Postgres today and did a comparison. I found 1665 IDs in Postgres that were missing from Elasticsearch, all created between 2018-08-08 and today 2018-09-10.

Details: https://gist.github.com/robertknight/27b4927839ecde4a372e279c7da0f6ef

I would guess that the reason there are no IDs missing from before that date is because we reindexed annotations around that time, which creates a new ES index and repopulates it from the DB.

From a brief skim of the data it looks like there wasn't a single period during which annotations went missing from ES entirely. Instead it appears that for 31 of the days since August 8th there have been between 1 and 149 annotations created in the DB which do not appear in Elasticsearch. The frequency per day is fairly evenly spread over that range.

Looking at Sentry the most frequent issue in h is an issue when publishing events to RabbitMQ: https://sentry.io/hypothesis/h/issues/499114248/. From a few of the dates I skimmed it looks like the frequency of this event always at least exceeded the number of missing annotations on that date. So a hypothesis is:

  1. A request is made to create, update or delete an annotation.
  2. The main request succeeds
  3. A subscriber which posts an event to RabbitMQ times out (in h/eventqueue.py) as in the Sentry report above.

What I am assuming above but we need to verify is that the failure in step (3) does not result in the DB transaction which modifies the annotation in Postgres from being committed.

What is the result of all this from a user's point of view? - This needs to be verified by reproducing locally, but it looks to me like the result is that, for ~10-100 annotations each day, creating the annotation will appear to succeed, but it will not be visible in the client after a page reload or in activity pages. The annotation is technically still there in the database but as far as the user is concerned it went missing.

judell commented 6 years ago

Thanks for investigating, @robertknight.

I've run into this again today. The annotation in question is https://hyp.is/SKXuHLuJEeiTq7cTQJbY6g/www.npr.org/2018/07/29/632994245/zimbabwes-rickety-trains-get-a-boost-from-expat-investors-next-door

Here's an attempt to delete it:

image

Here's the annotation in the db:

image

It is marked as deleted=true

My prediction, based on past behavior, is that in an hour or so, two things will be true:

I'm going to set a timer for 1 hour and return to check that prediction.

... time passes ...

Actually it didn't take that long, I rechecked a few minutes after writing the above report. It was not necessary to delete (i.e. request deletion of) the annotation, it was just gone -- as it should have been all along, as a result of the original delete request, which as we see above did register in PG.

On what cycle do we purge deleted entries, and how does that relate to PG <-> ES sync?

robertknight commented 5 years ago

This issue may have been resolved now that https://github.com/hypothesis/h/issues/5311 has been fixed. In brief there was a problem where failing to send messages to RabbitMQ to trigger realtime notifications could have a knock-on consequence that indexing of the annotation in Elasticsearch did not happen. This got fixed, but whether that did fix this issue needs to be confirmed.

klemay commented 5 years ago

@judell can you let us know if you can delete those annotations?

hmstepanek commented 5 years ago

I still think this is still broken. This bot creates annotations every minute and I see an annotation that wasn't deleted this week. see metabase for details

judell commented 5 years ago
select count(*) 
from annotation
where userid = 'acct:tester.smokey@hypothes.is'
and deleted = true 
and shared = true

6

(These 6 will be purged, right?)

from annotation
where userid = 'acct:tester.smokey@hypothes.is'
and deleted = false
and shared = true 

1361

That's how many should be in ES, right?

https://hypothes.is/api/search?user=tester.smokey result:

{"rows": [], "total": 0}

hmstepanek commented 5 years ago

Correct, yes the ones marked as deleted=true will be removed but the ones with deleted=false won't be. Hmmm...interesting that it's returning no results there. It almost makes me think it got marked as deleted in es but never got marked in the db as deleted?

judell commented 5 years ago

Which would also be explainable by dropped interprocess messages?

hmstepanek commented 5 years ago

Not sure about that. Maybe? I can't recall how the db deletion vs es deletion is scheduled. Whether they are separate subscribers or happen in the same method. It's also possible the db was overloaded or the connection failed too.

judell commented 5 years ago

Today I updated the annotation with the internal id abc12a3c-20e5-11e8-9e9a-e7aecfc8126c and although it is in postgres, elasticsearch cannot find it. This may or may not relate to the above, but in either case, is it possible to verify that it is, in fact, in elasticsearch limbo? And if so, is there a way to release it? I want to try deleting it, but I don't want to destroy the evidence which is the state as it exists now.

hmstepanek commented 5 years ago

How did you update it? Via the annotation update api/client? If so it may have lost the task to reindex it in elasticsearch. We can force a reindex but definitely worth looking into what happened first.

On Mar 6, 2019, at 3:04 PM, Jon Udell notifications@github.com wrote:

Today I updated the annotation with the internal id abc12a3c-20e5-11e8-9e9a-e7aecfc8126c and although it is in postgres, elasticsearch cannot find it. This may or may not relate to the above, but in either case, is it possible to verify that it is, in fact, in elasticsearch limbo? And if so, is there a way to release it? I want to try deleting it, but I don't want to destroy the evidence which is the state as it exists now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

hmstepanek commented 5 years ago

I see this when I look at that page-is this still incorrect somehow? image

judell commented 5 years ago

Arrghh, I captured the wrong internal id, it's this one: 09270238-9841-40dc-954f-3c917925862f.

How did you update it? Via the annotation update api/client?

Using the API. To update the tags field only. Which does look like it should in postgres.

If so it may have lost the task to reindex it in elasticsearch. We can force a reindex but definitely worth looking into what happened first.

Yeah, I'm not concerned about that particular annotation, more about what state it's in, how it got that way, how to know such a thing has happened, how likely that is, etc.

What is {"consumer": "00000000-0000-0000-0000-000000000000"} doing in extra field btw? Is that saying something about the consumer of the API?

hmstepanek commented 5 years ago

Didn’t get a chance to dig into this today but I did try to find update requests in the logs yesterday and they didn’t appear to be there. I’m thinking it might be helpful to add some custom monitoring to try to track down where this issue is happening/how often it happens etc. That way we can track the lifecycle of a ann from request, db, adding a task, handling the task, to getting indexed into es. That might give us more insight into what’s going on here and help us understand the severity of the problem.

hmstepanek commented 5 years ago

Also I don’t know off the top of my head what the consumer part is all those zeros look a bit odd.

judell commented 5 years ago

Oy, what a tangled web we weave. This annotation is a reply to a deleted annotation. So you can find it using /api/search, but you can't go to it: https://hypothes.is/a/CScCOJhBQNyVTzyReSWGLw.

Sorry for conflating this with the title of this issue, it's not an example of that syndrome.

I am curious about consumer though!