tshatrov / ichiran

Linguistic tools for texts in Japanese language
MIT License
299 stars 33 forks source link

ichiran-cli doesn't work #31

Closed Eltaurus-Lt closed 1 year ago

Eltaurus-Lt commented 1 year ago

When I run ichiran-cli -h (or any other request) I get this

Internal error #11 "Object is of the wrong type." at 0000000021ecb191
    SC: 0, Offset: 9    $1=       0x04f8a09f: other pointer
    SC: 3, Offset: 14   $2=       0x00003627: list pointer
fatal error encountered in SBCL pid 722019620:
internal error too early in init, can't recover

Even though there were no indication of errors in the previous steps, (ichiran/mnt:add-errata) performed just fine, and (ichiran/test:run-all-tests) returns Passed(748) Failed(0) Errors(0)

Also when using ichiran from SBCL, the first request after restarting SBCL and running (ql:quickload :ichiran) always yields an error. But all the following ones (even if it is the same request repeated) work as expected.

I'm running on Windows 10 and SBCL 2.3.0

tshatrov commented 1 year ago

I haven't personally encountered such an error, could be a bug in (that particular version of) SBCL? The latest version is 2.3.1 and has Windows binaries available. My Windows laptop was sitting on 2.1.0 so I upgraded it to 2.3.1 as well as all the latest quicklisp libraries and was able to build ichiran-cli.exe, with it running as expected.

the first request after restarting SBCL and running (ql:quickload :ichiran) always yields an error.

Is it the same error or do you have any error messages for this? Can't reproduce it either. You can try to run the following commands before any queries.

(init-all-caches)
(init-suffixes t)
Eltaurus-Lt commented 1 year ago

The latest version is 2.3.1 and has Windows binaries available

Oh, right. They released it just a couple of hours ago. I will try 2.3.1 and 2.1.0, although I'm not completely sure, how to properly install those without messing up the currently installed version. By the way, do I understand correctly, that if I install the needed version from the official binaries, I don't have to compile anything else, and the step with MSYS2 can be skipped entirely?


Is it the same error or do you have any error messages for this?

I thought I've seen different messages before, but every request I try now yields

debugger invoked on a TYPE-ERROR @21A3052A in thread
#<THREAD "main thread" RUNNING {10010A8113}>:
  The value
    NIL
  is not of type
    HASH-TABLE
  when binding HASH-TABLE

with a backtrace

Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {10010A8073}>
0: (SB-IMPL::GETHASH3 "一" NIL NIL) [external]
1: (ICHIRAN/DICT::GET-SUFFIX-MAP "一覧は最高だぞ")
2: (ICHIRAN/DICT::JOIN-SUBSTRING-WORDS* "一覧は最高だぞ")
3: (ICHIRAN/DICT::JOIN-SUBSTRING-WORDS "一覧は最高だぞ")
4: (ICHIRAN/DICT:DICT-SEGMENT "一覧は最高だぞ" :LIMIT 5)
5: (ICHIRAN/DICT:SIMPLE-SEGMENT "一覧は最高だぞ" :LIMIT 5)
6: (ICHIRAN:ROMANIZE "一覧は最高だぞ" :METHOD NIL :WITH-INFO T)
7: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ICHIRAN:ROMANIZE "一覧は最高だぞ" :WITH-INFO T) #<NULL-LEXENV>)
8: (EVAL (ICHIRAN:ROMANIZE "一覧は最高だぞ" :WITH-INFO T))
9: (INTERACTIVE-EVAL (ICHIRAN:ROMANIZE "一覧は最高だぞ" :WITH-INFO T) :EVAL NIL)
10: (SB-IMPL::REPL-FUN NIL)
11: ((LAMBDA NIL :IN SB-IMPL::TOPLEVEL-REPL))
12: (SB-IMPL::%WITH-REBOUND-IO-SYNTAX #<FUNCTION (LAMBDA NIL :IN SB-IMPL::TOPLEVEL-REPL) {10026EB0FB}>)
13: (SB-IMPL::TOPLEVEL-REPL NIL)
14: (SB-IMPL::TOPLEVEL-INIT)
15: ((FLET SB-UNIX::BODY :IN SB-IMPL::START-LISP))
16: ((FLET "WITHOUT-INTERRUPTS-BODY-3" :IN SB-IMPL::START-LISP))
17: (SB-IMPL::%START-LISP)
18: ("foreign function: #x14003E9E5")
19: ("foreign function: #x1400069E0")

You can try to run the following commands before any queries. (init-suffixes t)

Isn't it supposed to be (ichiran/dict:init-suffixes t)? I already end up using it before making any queries, it does bypass the error. However, if the source of an error in ichiran-cli is the same (as far as I understand, it might happen just because every query made through ichiran-cli is the first query for its process), running this extra request first doesn't seem to be an opiton.

(init-all-caches)

This one yields an error no matter what.

tshatrov commented 1 year ago

By the way, do I understand correctly, that if I install the needed version from the official binaries, I don't have to compile anything else, and the step with MSYS2 can be skipped entirely?

Yeah, the blog post was written back when the Windows binaries for sbcl 2.1.0 weren't available so you had to compile it, but now there's no need for that as there are many post-2.1.0 binaries available.

Isn't it supposed to be (ichiran/dict:init-suffixes t)?

Yes, ichiran/dict: part can be omitted if you're in ichiran/all package by running (in-package :ichiran/all) command. I was just copypasting code from ichiran-cli build process. Which brings us to:

However, if the source of an error in ichiran-cli is the same (as far as I understand, it might happen just because every query made through ichiran-cli is the first query for its process)

The initial state of ichiran-cli is different than from running sbcl and doing (ql:quickload :ichiran) because ichiran-cli.exe already has the suffix-cache and other caches preloaded during the build process. So if suffix-cache is NIL in ichiran-cli like in your traceback then there's possibly an issue with your database or database connection settings. You wouldn't have been able to pass all the tests without suffix cache though.

tshatrov commented 1 year ago

This one yields an error no matter what.

With package prefix it's called as (ichiran/conn:init-all-caches). To be fair each of these caches are loaded automatically on the first use so you don't have to run it yourself. ichiran-cli build does it for performance reasons, and in multithreaded code it might be a good idea to preload them, but for a simple use it's not necessary.

tshatrov commented 1 year ago

init-suffixes is a little different because of it's weird multithreaded implementation: when it's first called it creates a background thread which initializes it to empty hashtable and then keeps filling it up (which used to take much longer back then).

Now I see where the bug is. The variable *suffix-cache* is NIL from the start and even though the background thread initializes it to empty hashtable at the very beginning, there's still a little time when something could try to access *suffix-cache* thinking it's already initialized! For some reason I never had such a timing issue so I never noticed this bug! I'm going to fix it now...

tshatrov commented 1 year ago

@Eltaurus-Lt the last commit should fix the "first query error" bug. However I would still recommend running init-suffixes in advance because without the fully loaded suffix cache the segmentation is not as good.

Eltaurus-Lt commented 1 year ago

Commands in the SBCL version work from the very first one now. Thank you for the quick fix!

Unfortunately, the issue with ichiran-cli still remains (I've deleted the old ichiran-cli.exe and built a new one after updating dict-grammar.lisp and running the errata and tests again). Calling ichiran-cli -h from the console yields

Internal error #11 "Object is of the wrong type." at 0000000021ea2271
    SC: 0, Offset: 9    $1=       0x04f3a09f: other pointer
    SC: 3, Offset: 14   $2=       0x00003627: list pointer
fatal error encountered in SBCL pid 1397614204:
internal error too early in init, can't recover

with a backtrace:

   0: fp=00000000005ff428 pc=0000000021ea2271 SB-KERNEL::CSUBTYPEP
   1: fp=00000000005ff4b8 pc=0000000022253fed SB-IMPL::PICK-INPUT-ROUTINE
   2: fp=00000000005ff5b8 pc=0000000022255e75 SB-IMPL::SET-FD-STREAM-ROUTINES
   3: fp=00000000005ff710 pc=000000002200546a SB-SYS::MAKE-FD-STREAM
   4: fp=00000000005ff8f8 pc=00000000220f825c SB-IMPL::STREAM-REINIT
   5: fp=00000000005ff940 pc=000000002218b15d (FLET "WITHOUT-GCING-BODY-0" :IN SB-IMPL::REINIT)
   6: fp=00000000005ff988 pc=000000002218ada4 SB-IMPL::REINIT
   7: fp=00000000005ffa20 pc=000000002244c690 (FLET SB-UNIX::BODY :IN SB-IMPL::START-LISP)
   8: fp=00000000005ffad8 pc=000000002244c4b4 (FLET "WITHOUT-INTERRUPTS-BODY-3" :IN SB-IMPL::START-LISP)
   9: fp=00000000005ffb70 pc=000000002244c2d7 SB-IMPL::%START-LISP
  10: fp=00000000005ffbb0 pc=000000014003e9e5 Foreign function
  11: fp=00000000005ffbc0 pc=00000001400069e0 Foreign function
ldb>
Eltaurus-Lt commented 1 year ago

Restarting the system one more time and rebuilding the executable again made it work. Maybe it was a problem with file access because of background threads remaining after running the commands in sbcl which interfered with the build process or something 🤷‍♂️ Anyway, the issue seems to be resolved now. Thanks!