opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system
https://wiki.opencog.org/w/AtomSpace
Other
801 stars 225 forks source link

Exceptions thrown message turn into numbers in guile #34

Closed ngeiswei closed 8 years ago

ngeiswei commented 9 years ago

It seems I can't get the exception message of a function in guile.

Specifically when trying to executing the example

examples/rule-engine/simple-deduction.scm

I get something like

ERROR: In procedure opencog-extension:
ERROR: Throw to key `decoding-error' with args `("scm_from_stringn" "input locale conversion error" 84 #vu8(66 97 99 107 116 114 97 99 101 58 10 73 110 32 115 121 115 116 101 109 47 118 109 47 116 114 97 112 45 115 116 97 116 101 46 115 99 109 58 10 32 49 55 50 58 32 49 57 32 91 119 105 116 104 45 100 101 102 97 117 108 116 45 116 114 97 112 45 104 97 110 100 108 101 114 32 35 102 32 46 46 46 93 10 73 110 32 105 99 101 45 57 47 98 111 111 116 45 57 46 115 99 109 58 10 49 54 52 53 58 32 49 56
...

Tracking down the problem in gdb it seems to occur when an exception is thrown.

linas commented 9 years ago

the numbers look like ascii. Try converting them by hand.

I've never seen anything like this before.

What locale are you in? If you set it to a utf8 locale, does the error go away?

ngeiswei commented 9 years ago

Here's my locale:

nilg@mons-desktop:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
ngeiswei commented 9 years ago

This bug is driving me crazy, it makes it a big pain to debug scheme code.

It seems related to that bug

http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10742

but it's been fixed a long time ago. And my locales are all en_US.UTF-8.

I don't know what to do...

linas commented 9 years ago

OK, that's bizarre, but I can reproduce it. So, when I convert the ASCI by hand, I get this:

Backtrace:
In system/repl/repl.scm:
 188: 19 [run-repl* # #<procedure prompting-meta-read (repl)>]
In ice-9/boot-9.scm:
 845: 18 [catch #t #<unspecified> ...]
In system/vm/trap-state.scm:
 172: 17 [with-default-trap-handler #f _ _]
In ice-9/boot-9.scm:
1960: 16 [(null)tart-stack #t _]
1963: 15 [#f]
 845: 14 [catch #<unspecified> #<unspecified> ...]
In system/repl/repl.scm:
 201: 13 [#<procedure c737c0 at system/repl/repl.scm:191:18 ()>]
In ice-9/boot-9.scm:
 845: 12 [catch #t #<unspecified> ...]
In system/vm/trap-state.scm:
 172: 11 [with-default-trap-handler #<procedure debug-trap-handler (frame trap-idx trap-name)> ...]
In ice-9/boot-9.scm:
1960: 10 [(null)tart-stack #t _]
1963: 9 [#f]
In system/repl/repl.scm:
 159: 8 [#f]
In ice-9/boot-9.scm:
1960: 7 [�
           tart-stack #t _]
1963: 6 [#f]
In unknown file:
   ?: 5 [opencog-extension cog-fc (# "examples/rule-engine/cpolicy.json")]
   ?: 4 [apply-smob/1 #<catch-closure c6cf20>]
   ?: 3 [apply-smob/1 #<catch-closure c6ce80>]
In /build/tests/rule-engine/../../..//examples/rule-engine/rules/crisp-deduction.scm:
  65: 2 [crisp-deduction-formula _ _ _]
In ice-9/boot-9.scm:
 778: 1 [dispatch-exception _ unbound-variable ...]
In unknown file:
   ?: 0 [apply-smob/1 #<catch-closure c6ce40> unbound-variable ...]

ERROR: In procedure apply-smob/1:
ERROR: In procedure module-lookup: Unbound variable: cog-stv-strength
ABORT: unbound-variable

and the non-ascii char is clearly visible in the above.

In the bug report, it says to do this: LC_ALL=en_US.UTF-8 guile and when I do that, I get this:

scheme@(guile-user)> (cog-fc AB "examples/rule-engine/cpolicy.json")
ERROR: In procedure opencog-extension:
ERROR: Throw to key `C++-EXCEPTION' with args `("cog-fc" "Backtrace:\nIn system/repl/repl.scm:\n 188: 19 [run-repl* # #<procedure prompting-meta-read (repl)>]\nIn ice-9/boot-9.scm:\n 845: 18 [catch #t #<unspecified> ...]\nIn system/vm/trap-state.scm:\n 172: 17 [with-default-trap-handler #f _ _]\nIn ice-9/boot-9.scm:\n1960: 16 [(null)tart-stack #t _]\n1963: 15 [#f]\n 845: 14 [catch #<unspecified> #<unspecified> ...]\nIn system/repl/repl.scm:\n 201: 13 [#<procedure 1186cc0 at system/repl/repl.scm:191:18 ()>]\nIn ice-9/boot-9.scm:\n 845: 12 [catch #t #<unspecified> ...]\nIn system/vm/trap-state.scm:\n 172: 11 [with-default-trap-handler #<procedure debug-trap-handler (frame trap-idx trap-name)> ...]\nIn ice-9/boot-9.scm:\n1960: 10 [(null)tart-stack #t _]\n1963: 9 [#f]\nIn system/repl/repl.scm:\n 159: 8 [#f]\nIn ice-9/boot-9.scm:\n1960: 7 [pGM\x01tart-stack #t _]\n1963: 6 [#f]\nIn unknown file:\n   ?: 5 [opencog-extension cog-fc (# \"examples/rule-engine/cpolicy.json\")]\n   ?: 4 [apply-smob/1 #<catch-closure 1183ce0>]\n   ?: 3 [apply-smob/1 #<catch-closure 1183c80>]\nIn /build/tests/rule-engine/../../..//examples/rule-engine/rules/crisp-modus-ponens.scm:\n  39: 2 [crisp-modus-ponens-formula _ _ _]\nIn ice-9/boot-9.scm:\n 778: 1 [dispatch-exception _ unbound-variable ...]\nIn unknown file:\n   ?: 0 [apply-smob/1 #<catch-closure 1183c40> unbound-variable ...]\n\nERROR: In procedure apply-smob/1:\nERROR: In procedure module-lookup: Unbound variable: cog-stv-strength\nABORT: unbound-variable\n (/opencog/guile/SchemeEval.cc:980)")'.

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.

which is exactly the same message except the newlines failed.

So, there are two errors: first cog-stv-strength is not defined. More seriously though, something is clobbering RAM, and printing a null char into the middle: look at 1960 ... tart -stack It is this null char that is causing the print failure. Perhaps valgrind can find it?

linas commented 9 years ago

When I run valgrind, I get this: http://stackoverflow.com/questions/10360208/why-valgrind-is-killed-during-memory-leak-detection which was closed as a "non-constructive question" by some non-constructive idiots.

linas commented 9 years ago

I can't get electric fence to work with guile.

The four bad ascii chars are this in decimal:

176,18,12,1,

in binary that's 0x b0 12 0c 01 or little-endian 0x010c12b0 which looks like a heap pointer.

linas commented 9 years ago

oh wait, but that's 32-bit and my system is 64-bit

linas commented 9 years ago

since its four bad bytes on my 64-bit system, I assume that they are a stray bytecode from somewhere. It still means that there is a bad pointer somewhere, causing stray garbage to be written into the middle of an error message. For the moment, I am stumped.

ngeiswei commented 9 years ago
LC_ALL=en_US.UTF-8

does not work for me.

Here's a script to convert the ASCII numbers into characters (you must pipe the number sequence, and only the number sequence, into it)

while read -d' ' cn; do printf "\\$(printf '%03o' $cn)"; done
linas commented 8 years ago

on rare occasions, I see this. Not sure what the deeper problem is...

linas commented 8 years ago

I think I may have fixed this, in https://github.com/opencog/cogutils/pull/17

This is a clue:

1960: 7 [��
           �tart-stack #t _]

What would normally go there is a %s and the Logger class thought this was a print format specifier. This message gets generated when you send bad scheme code into guile, via pattern matcher, rule-engine, etc. The guile interpreter correctly sees the error, the opencog guile code turns it into a C++ exception, and the cogutils exception code tries to log it to the log file. The %s confuses the logger and messes everything up.

linas commented 8 years ago

I'm closing this because I am pretty sure that is the root of the bug. If you hit this again, please re-open ....

ngeiswei commented 8 years ago

I'm reopening that issue as I've just experienced it again. As far as I can tell it's not a problem with StandardException::parseErrorMessage it seems to use another exception pathway.

ngeiswei commented 8 years ago

Actually it does use StandardException::parseErrorMessage, makes more sense now!

linas commented 8 years ago

what's the actual message? easy steps to reproduce it?

ngeiswei commented 8 years ago

The message (once decoded) is as follows:

Backtrace:
In ice-9/boot-9.scm:
1645: 19 [(null)tart-stack #t #<procedure 172a4e0 at system/repl/repl.scm:179:14 ()>]
1650: 18 [#<procedure 172a3f0 ()>]
 157: 17 [catch quit #<procedure 172a3c0 at system/repl/repl.scm:181:18 ()> ...]
In system/repl/repl.scm:
 191: 16 [#<procedure 172a3c0 at system/repl/repl.scm:181:18 ()>]
In ice-9/boot-9.scm:
 157: 15 [catch #t ...]
In system/vm/trap-state.scm:
 172: 14 [with-default-trap-handler #<procedure debug-trap-handler (frame trap-idx trap-name)> ...]
In ice-9/boot-9.scm:
1645: 13 [
tart-stack #t #<procedure 1776d50 at system/repl/repl.scm:192:32 ()>]
1650: 12 [#<procedure 1776c60 ()>]
In system/repl/repl.scm:
 148: 11 [with-stack-and-prompt #<procedure 1711040>]
In ice-9/boot-9.scm:
1645: 10 [tart-stack #t #<procedure 173cd40 at system/repl/repl.scm:149:33 ()>]
1650: 9 [#<procedure 1776c30 ()>]
In unknown file:
   ?: 8 [opencog-extension cog-bind (#)]
In ice-9/boot-9.scm:
 157: 7 [catch #t #<catch-closure 173c860> ...]
In unknown file:
   ?: 6 [apply-smob/1 #<catch-closure 173c860>]
In ice-9/boot-9.scm:
 157: 5 [catch #t #<catch-closure 173c4a0> ...]
In unknown file:
   ?: 4 [apply-smob/1 #<catch-closure 173c4a0>]
In /home/nilg/OpenCog/opencog/examples/moses-pln-synergy/scm/../../../opencog/reasoning/pln/rules/forall-instantiation-rule.scm:
  71: 3 [pln-formula-forall-instantiation # #]
In unknown file:
   ?: 2 [cog-new-link 29 # #]
In ice-9/boot-9.scm:
 102: 1 [#<procedure 173a540 at ice-9/boot-9.scm:97:6 (thrown-k . args)> C++-EXCEPTION ...]
In unknown file:
   ?: 0 [apply-smob/1 #<catch-closure 173c460> C++-EXCEPTION ...]

ERROR: In procedure apply-smob/1:
ERROR: In procedure cog-new-link: PutLink has mismatched value list! (/home/nilg/OpenCog/atomspace/opencog/atoms/core/PutLink.cc:171)
ABORT: C++-EXCEPTION
 (/home/nilg/OpenCog/atomspace/opencog/guile/SchemeEval.cc:986

I didn't manage to easily reproduce it yet (in fact trying to do so led me to want to improve PM's logging system).

linas commented 8 years ago

Ah, well, but this is more or less exactly the same bug as before. Here's the clue again:

Backtrace:
In ice-9/boot-9.scm:
1645: 19 [(null)tart-stack 

which is supposed to be [%start stack Some code somewhere passed this straight in to the logger, the %s was found as a printf directive, and everything falls apart from there.

linas commented 8 years ago

So pull request #313 should fix this -- the last line of the message:

atomspace/opencog/guile/SchemeEval.cc:986

pointed exactly at the line that needed for be fixed. I'm grepping for more of these ...

linas commented 8 years ago

If its OK w/ you, I am closing this, since I'm really sure that #313 fixes this. Although protocol is that really you should be closing not me...

ngeiswei commented 8 years ago

It doesn't print new lines, but that's already a huge improvement. I may try to fix the newline myself. Thanks!