opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system
https://wiki.opencog.org/w/AtomSpace
Other
801 stars 225 forks source link

(do-em-all) throws a C++ exception after loading around 50,000-60,000 atoms #120

Closed rohit12 closed 8 years ago

rohit12 commented 9 years ago

I am gathering statistics on a parse server that Linas has set up for gathering data related to the language learning project. I ran (do-em-all) after I gathered some substantial amount of word pairs, so that the MI could be computed and further processing could begin. This was the error/exception that I was getting:

Backtrace: In ice-9/boot-9.scm: 157: 12 [catch #t #<catch-closure 40bb4bc0> ...] In unknown file: ?: 11 [apply-smob/1 #<catch-closure 40bb4bc0>] In ice-9/boot-9.scm: 157: 10 [catch #t #<catch-closure 40bb4a80> ...] In unknown file: ?: 9 [apply-smob/1 #<catch-closure 40bb4a80>] ?: 8 [call-with-input-string "(do-em-all)\n" ...] In ice-9/boot-9.scm: 2320: 7 [save-module-excursion #<procedure 40b5f7b0 at ice-9/eval-string.scm:65:9 ()>] In ice-9/eval-string.scm: 44: 6 [read-and-eval #<input: string 4184a270> #:lang ...] 37: 5 [lp (do-em-all)] In /home/rohit/../..//home/opencog/src/opencog/opencog/nlp/learn/compute-mi.scm: 1018: 4 [batch-all-pair-mi (LinkGrammarRelationshipNode "ANY" (ctv 1 0 22408396)) ] 717: 3 [batch-all-pair-wildcard-counts #] In unknown file: ?: 2 [opencog-extension fetch-incoming-set (#)] In ice-9/boot-9.scm: 102: 1 [#<procedure 3e597a00 at ice-9/boot-9.scm:97:6 (thrown-k . args)> C++-EXCEPTION ...] In unknown file: ?: 0 [apply-smob/1 #<catch-closure 40bb4a40> C++-EXCEPTION ...]

ERROR: In procedure apply-smob/1: ERROR: In procedure fetch-incoming-set: Unexpected handle mismatch! Expected 15981 got 1792030 (/home/opencog/src/atomspace/opencog/atomspace/AtomSpace.cc:303) ABORT: C++-EXCEPTION

linas commented 9 years ago

any chance you've run out of RAM? Can you look? It appears that the code calls a function named (delete-hypergaph) and I can't find that function anywhere ... looking for it now.

linas commented 9 years ago

delete-hypergraph was renamed topurge-hypergraph in utilities.scm, but not in the NLP directories. I will fix this shortly.

linas commented 9 years ago

I have not been able to reproduce this. so far. I do see that the process runs out of RAM every 2-4 hours and gets killed, as a result. Seems like you have a shell script that is restarting it.

linas commented 9 years ago

Also: when you get a chance, I would like to restart the postgres server, I made some config changes that should speed things up a lot (I forgot to make those changes before)

rohit12 commented 9 years ago

I checked it again. It isn't running out of RAM. For the record, I used the command "free -m" to check available RAM. Before I ran (do-em-all), it showed 5GB of available RAM.

rohit12 commented 9 years ago

This time it fails after loading half a million atoms. The error type was the same as above though.

linas commented 9 years ago

So, I'm looking at the log file. I see what the bug is.

It looks like you stop the server, then restart it. You start feeding in parse data. It runs for a few minutes, and you notice that you forgot to open the database. So, now you open the database, and, moments later, the error occurs.

Is this what you're doing, because it sure looks like it from the log file:

[2015-07-18 02:35:07:716] [INFO] Using config file found at: /home/rohit/./opencog-en.conf
[2015-07-18 02:35:11:156] [INFO] Starting CogServer loop.
[2015-07-18 02:35:36:502] [ERROR] No backing store (/home/opencog/src/atomspace/opencog/atomspace/AtomSpace.h:221)
[2015-07-18 02:38:05:137] [INFO] Guile caught C++ exception: Unexpected handle mismatch! Expected 261 got 130
 (/home/opencog/src/atomspace/opencog/atomspace/AtomSpace.cc:303)

Yes, the above will result in the "Unexpected handle mismatch" error, because, before opening the database, some atom is created, its issued a UUID, and the same atom is in the database with a different UUID.

To avoid the above raceyness, the database is supposed to be opened from the config file, before any atoms get created. However, when this open is done, there is a different message:

[2015-07-18 02:35:08:007] [WARN] Server compiled without database support

which is wrong, and is at the root cause of the problem. Am trying to track this down now.

linas commented 8 years ago

Clsosing, this was fixed via numerous commits. The most important one was a modification oft the SQL tables to prevent duplicate atoms from being stored in the SQL database. These duplicate atoms were confusing the atomsapace, when a given atom was to be loaded.