hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
16 stars 4 forks source link

Cache problem? Resource not shown. #120

Closed dr0i closed 9 years ago

dr0i commented 9 years ago

This is a report about a bug which is not reproducible. It shows up from time to time:

Data reside in ES but the API doesn't find it. This is cached for an hour and the resource is not found for an hour. After the hour, the resource shows up. The logs don't help, neither elasticsearch.log nor play.log show up something useful. TODO:changing log level. It appears sometimes in the context of updating test data , see #101. The bug would be less annoying if it would not be cached for such a long time. And then, some resources don't pop up after this one hour , e.g. 10128262-X.

dr0i commented 9 years ago

This bug feels like the "old cluster split problem", see hbz/lobid#69. Only, using curl every query works, and using the API sometimes it works, sometimes not. Weired: Deploying the API locally and run it: query fails. Deploying the API locally and run it and debug it using eclipse: query succcess.

Interesting, because it's the same code and the same config!

dr0i commented 9 years ago

Just (that was: 27.02.) got a anorther clue that it's a cluster problem: http://lobid.org/resource?subject=4414195-6 normally should result in 7 hits. Now I got one time 4 results, seconds later same query got 3 results. Putting these results together would be exactly the expected resources. So I must conclude that the two nodes we have are not in sync . The cause could be the restart done two days ago: [2015-02-25 10:11:40,495][INFO ][node ] [DE-605_quaoar2] stopped ... [2015-02-25 10:12:22,394][INFO ][node ] [DE-605_quaoar2] started The restart was necessary because the swap usage was 75% and so the system was very slow. Even if at this time no indexing was done it seems that this service restart may have provoked the split brain situation we seem to be in now.

dr0i commented 9 years ago

After weekend's reindexing the behaviour is back to normal. That's a further clue that we have had a split brain. Closing this issue.