ibm-cloud-solutions / hubot-ibmcloud-cognitive-lib

Provides helper functions for configuring, storing, and processing information related to natural language processing of a statement
http://www.ibm.com/
Other
0 stars 0 forks source link

No response due to database corruption error #7

Open aeweidne opened 8 years ago

aeweidne commented 8 years ago

Fri Jul 29 2016 16:07:37 GMT+0000 (UTC)] ERROR OpenError: Corruption: 2 missing files; e.g.: /home/hubot/databases/nlc/000016.ldbAdapter: slack, Robot: ibmcloudbot, Room: hubot-testing, User: {"id":"U12J8UR37","name":"yucao","real_name":"YU CAO","email_address":"ycao@us.ibm.com"}

@nbarker I am seeing this in the container. Not sure how db got corrupted since db is only created during the image build and won't be modified anymore afterwards. can you take a look and investigate why it happens? @jpadilla also confirmed he saw this several time and he had to wipe out the database

clanzen commented 8 years ago

I'll take a stab at this. Any other tidbits of info to assist in recreating are welcome.

clanzen commented 8 years ago

I spoke to @ycao56 today who said he may have put in a fix for this. Since his change he hasn't seen it. @jlpadilla said he hasn't seen it recently. I haven't had luck recreating. That said, I'm going to put this on the back burner and take my name off it. We can keep it around a bit longer in case it recreates.

jlpadilla commented 8 years ago

I got this error again, but still don't know how to recreate consistently.

[Thu Aug 11 2016 23:47:43 GMT-0400 (EDT)] ERROR OpenError: Corruption: 1 missing files; e.g.: databases/nlc/000156.ldbAdapter: slack, Robot: jorgebot, Room: D1C3CTZL1, User: {"id":"U0C44F480","name":"jorge","real_name":"JORGE PADILLA"}```
clanzen commented 8 years ago

@jlpadilla Could you check out this page about debugging PouchDb and see if you could apply it to your environment? There is a DB inspector, but also some tips on enabling debug so we could get the most out of a future recreate. That way, next time you see it happen, we can get something out of it. I've seen some cases, where PouchDB is sending back a 500 purely due to an internal error. Its a stretch, but if we had good evidence, an issue could be opened with them too.

https://pouchdb.com/guides/databases.html

jlpadilla commented 8 years ago

I still don't have a reliable way to recreate, and haven't captured any meaningful data (to my eyes) from Pouch debug, but this problem is happening very often during my current test sequence, so I'm documenting some observations.

I'm currently testing sync and NLC training, so I have SYNC_INTERVAL set to 1 minute. I'm constantly training the classifier, so it goes through the getClasses() logic often.

The problem seems to appear only when I add more records to the DB through nlcDb.js, never through the initial sequence initDb.js.

jlpadilla commented 8 years ago

Capturing some observations. Noticed corruption messages started after the following sync error.

[Mon Aug 22 2016 11:42:58 GMT-0400 (EDT)] - error: nlcDb.js: Error during sync of NLC training data with Cloudant. Will retry in 60 seconds. Error: ETIMEDOUT at [object Object]._onTimeout (/Users/jpadilla/workspaces/ibm-cloud-solutions/hubot-ibmcloud-cognitive-lib/node_modules/pouchdb/node_modules/request/request.js:772:15) at Timer.listOnTimeout (timers.js:92:15)

Then, after a restart I saw this:

[Mon Aug 22 2016 11:44:07 GMT-0400 (EDT)] - error: nlcDb.js: Error initializing Cloudant sync OpenError: Corruption: 2 missing files; e.g.: databases/nlc/000108.ldb
  at /Users/jpadilla/workspaces/ibm-cloud-solutions/hubot-ibmcloud-cognitive-lib/node_modules/pouchdb/node_modules/levelup/lib/levelup.js:120:34
  at /Users/jpadilla/workspaces/ibm-cloud-solutions/hubot-ibmcloud-cognitive-lib/node_modules/pouchdb/node_modules/leveldown/node_modules/abstract-leveldown/abstract-leveldown.js:40:16
jlpadilla commented 8 years ago

This seems to be related to this PouchDB issue https://github.com/pouchdb/pouchdb/issues/3224