vseloved / cl-nlp

Common Lisp NLP toolset
Other
219 stars 28 forks source link

Lispworks issues with chars.lisp #5

Closed ELind77 closed 7 years ago

ELind77 commented 9 years ago

I've found a couple of compatibility issues with the chars.lisp file in src/utils/ and LispWorks

the first was in the +WHITE-CHARS+ param, LispWorks uses #\NO-BREAK-SPACE so I did:

(defparameter +white-chars+
  '(#\Space #\Tab #\Newline #\Return #\Linefeed
    ;; lispworks uses #\no-break-space
    #+(and lispworks unicode) #\no-break-space
    #+(or (and sbcl sb-unicode) (and allegro ics) (and clisp i18n)
    (and openmcl openmcl-unicode-strings))
    #\no-break_space
    )
  "Chars considered WHITESPACE.")

I expect there may be a better way to do this that fits with your project coding standards but I leave that integration to you. Example used for fix: link to CLSQL project

Once I put that in the compile got further into the file and I found a character encoding issue. Some of the quotation characters are multi-byte characters that LispWorks can't read properly. Emacs appears to have no problem displaying them, but when opened in Lispworks it doesn't display them properly and the compiler can't read the characters. LispWorks uses UTF-16 internally, and if there are char-codes for the characters you are using that are the same across UTF-8/16 that might work. There may also be a more elegant solution but I don't know enough about how LispWorks is treating the characters to figure anything else out.

I may just switch over to sbcl to test out this project. What lisp implementation are you developing in?

--Eric

ELind77 commented 9 years ago

Re-read the blog post saw that you are using sbcl. Will try and set that up to sets out the project.

vseloved commented 9 years ago

Hi, thanks for the notice!

I believe, this would be a simpler solution:

(defparameter +white-chars+
  (list #\Space #\Tab #\Newline #\Return #\Linefeed
        (code-char 160))  ; #\No-Break_Space
  "Chars considered WHITESPACE.")

What do you think?

ELind77 commented 9 years ago

That looks good to me!

Do you have any thoughts on the multi-byte chars later in that file?

vseloved commented 9 years ago

Yes, thanks for the remainder. I think, the chars issue will be handled the best with the conditional compilation like you've proposed:

#-lispworks '(#\" #\‘ #\’ #\« #\» #\“ #\” #\') 
#+lispworks ...

If you could provide a filling for the '...' for LispWorks as a pull request I'd be grateful, as I don't have access to that platform now.

ELind77 commented 9 years ago

So, (code-char 160) worked fine. But something weird is going on with the conditional compilation for the other characters. I changed the parameters to look like this:

(defparameter +quote-chars+
  #-lispworks '(#\" #\‘ #\’ #\« #\» #\“ #\” #\')
  #+lispworks (mapcar #'code-char '(34 8216 8217 171 187 8220 8221 39))
  "Chars considered legitimate quotation marks.")

And when the #-lispworks line is commented out everything loads fine. But when I try and compile/load with the code as above, it hits the same error it did before and I have no idea why. I have no problem evaluating it in the listener but when I try to compile and load the file it freaks out. I typically just use emacs integration with LW to develop but I tried running it in the LW IDE itself to see what would happen and it couldn't even display the characters in the editor pane. Maybe LW can't handle non-ASCII-encoded source files? I'll try and look around to see if I can find anything for that.

But since I was able to compile the chars.lisp file I went ahead and fixed the next compiler error which was in the tokenize function. Lines 255 and 266 of that file also have some UTF-8 characters (the paragraph symbol and I put in conditional compilation for them and that actually worked. Which only adds to the mystery of the chars file.

So the tokenize function looks like this:

(defmethod tokenize ((tokenizer baseline-sentence-tokenizer) string)
  (mv-bind (words word-spans)
      (tokenize (make 'regex-word-tokenizer :regex "[^\\s]+")
                (substitute
         #-lispworks #\¶
         #+lispworks (code-char 182)
         #\Newline string))
    (let ((beg 0)
          sentences spans)
      (loop :for ws :on words :and ss :on word-spans :do
         (let ((word (first ws))
               (span (first ss)))
           (when (or (null (rest ws))
                     (and (member (char word (1- (length word)))
                                  #-lispworks '(#\. #\? #\! #\¶)
                  #+lispworks '(#\. #\? #\! (code-char 182))
                  )
                          (not (member word +abbrevs-with-dot+
                                       :test #'string-equal))
                          (and-it (second ws)
                                  (upper-case-p (char it 0)))))
             (push (sub string beg (rt span)) sentences)
             (push (pair beg (rt span)) spans)
             (setf beg (lt (second ss))))))
      (values (reverse sentences)
              (reverse spans)))))

And then I hit one last error which completely threw me. The COUT-NGRAM-FREQS function in ngrams.lisp had an error that made no sense to me: ../core/ngrams.lisp:250:8: error: The variable #:|table3296| is unbound

And it was pointing to the :+ in the body of the LOOP. I tried a bunch of stuff but I really have no idea what the problem is. The weird part is that if I replace the #` function with a lambda it works fine in the listener but still fails when compiling the file.

After failing to fix that I tried to load up the files with sbcl 1.2.4 and that didn't work either :< (I have some kind of compatibility issue with 1.0.58 on my desktop). I have a backtrace for that but I wasn't sure if I should just copy-paste it here as this post is already getting quite long. Let me know what you want me to do with that.

-- Eric

vseloved commented 9 years ago

Hi Eric, thanks for the detailed description! It made me download and install LispWorks Personal Edition to figure out the issues you face there, and I can reproduce everything. Still, I also don't understand the reason for the unbound variable error. I'll try to fiddle with it.

Please, post the SBCL stacktrace as well.

ELind77 commented 9 years ago

Here is the backtrace from the failed sbcl load-compile (I tried version 1.2.3 and 1.2.4, I can't get 1.0.58, which works, to run on my desktop because of some conflict):

The value # is not of type (OR FUNCTION SYMBOL). [Condition of type TYPE-ERROR]

Restarts: 0: [RETRY] Retry compiling #<CL-SOURCE-FILE "rutils" "core" "readtable">. 1: [ACCEPT] Continue, treating compiling #<CL-SOURCE-FILE "rutils" "core" "readtable"> as having been successful. 2: [RETRY] Retry ASDF operation. 3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration. 4: [ABORT] Give up on "rutils" 5: [RETRY] Retry SLIME REPL evaluation request. --more--

Backtrace: 0: (SET-MACRO-CHARACTER #( # NIL #<NAMED-READTABLE RUTILS.READTABLE:RUTILS-READTABLE {1007841D43}>) [tl,external] 1: (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO #<NAMED-READTABLE RUTILS.READTABLE:RUTILS-READTABLE {1007841D43}> :STANDARD) 2: ((LAMBDA NIL :IN "/home/eric/Dropbox/Synapsify/Misc_Lisp_Work/CL_NLP/rutils/core/readtable.lisp")) 3: (SB-INT:SIMPLE-EVAL-IN-LEXENV (LET ((READTABLE #)) (COND (# #) (T # #)) (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO READTABLE (QUOTE :STANDARD)) (SET-MACRO-CHARACTER #} (GET-MACRO-CHARACTER .. 4: (SB-INT:SIMPLE-EVAL-IN-LEXENV (PROGN (LET (#) (COND # #) (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO READTABLE #) (SET-MACRO-CHARACTER #} # NIL . #1=#) (SET-DISPATCH-MACRO-CHARACTER ## #\v .. 5: (EVAL-TLF (PROGN (LET (#) (COND # #) (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO READTABLE #) (SET-MACRO-CHARACTER #} # NIL . #1=#) (SET-DISPATCH-MACRO-CHARACTER ## #\v # . #1#) ...)) 2 #<N.. 6: (SB-C::EVAL-COMPILE-TOPLEVEL ((LET (#) (COND # #) (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO READTABLE #) (SET-MACRO-CHARACTER #} # NIL . #1=#) (SET-DISPATCH-MACRO-CHARACTER ## #\v # . #1#.. 7: ((FLET SB-C::DEFAULT-PROCESSOR :IN SB-C::PROCESS-TOPLEVEL-FORM) (LET ((READTABLE #)) (COND (# #) (T # #)) (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO READTABLE (QUOTE :STANDARD)) (SET-MACRO-C.. 8: (SB-C::PROCESS-TOPLEVEL-FORM (LET ((READTABLE #)) (COND (# #) (T # #)) (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO READTABLE (QUOTE :STANDARD)) (SET-MACRO-CHARACTER #} (GET-MACRO-CHARACTER #.. 9: (SB-C::PROCESS-TOPLEVEL-PROGN ((LET (#) (COND # #) (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO READTABLE #) (SET-MACRO-CHARACTER #} # NIL . #1=#) (SET-DISPATCH-MACRO-CHARACTER ## #\v # . #1.. 10: (SB-C::PROCESS-TOPLEVEL-FORM (EVAL-WHEN (:LOAD-TOPLEVEL :EXECUTE) (LET (#) (COND # #) (EDITOR-HINTS.NAMED-READTABLES:MERGE-READTABLES-INTO READTABLE #) (SET-MACRO-CHARACTER #} # NIL . #1=#) (SET-DISP.. 11: ((FLET SB-C::DEFAULT-PROCESSOR :IN SB-C::PROCESS-TOPLEVEL-FORM) (EDITOR-HINTS.NAMED-READTABLES:DEFREADTABLE RUTILS.READTABLE:RUTILS-READTABLE (:MERGE :STANDARD) (:MACRO-CHAR #} (GET-MACRO-CHARACTER #.. 12: (SB-C::PROCESS-TOPLEVEL-FORM (EDITOR-HINTS.NAMED-READTABLES:DEFREADTABLE RUTILS.READTABLE:RUTILS-READTABLE (:MERGE :STANDARD) (:MACRO-CHAR #} (GET-MACRO-CHARACTER #))) (:DISPATCH-MACRO-CHAR ## #\v .. 13: (SB-C::PROCESS-TOPLEVEL-PROGN ((DEFUN RUTILS.READTABLE:|#v-reader| (STREAM CHAR RUTILS.READTABLE::ARG) "Literal syntax for vectors. ..) 14: (SB-C::PROCESS-TOPLEVEL-FORM (EVAL-WHEN (:COMPILE-TOPLEVEL :LOAD-TOPLEVEL :EXECUTE) (DEFUN RUTILS.READTABLE:|#v-reader| (STREAM CHAR RUTILS.READTABLE::ARG) "Literal syntax for vectors. ..) 15: (SB-C::SUB-SUB-COMPILE-FILE #) 16: ((FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK :IN SB-C::SUB-COMPILE-FILE)) 17: ((FLET #:WITHOUT-INTERRUPTS-BODY-676 :IN SB-THREAD::CALL-WITH-RECURSIVE-LOCK)) 18: (SB-THREAD::CALL-WITH-RECURSIVE-LOCK #<CLOSURE (FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK :IN SB-C::SUB-COMPILE-FILE) {7FFFF4B4CCCB}> #<SB-THREAD:MUTEX "World Lock" owner: #<SB-THREAD:THREAD "repl-thr.. 19: ((LAMBDA NIL :IN SB-C::SUB-COMPILE-FILE)) 20: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT)) 21: (SB-C::SUB-COMPILE-FILE #) 22: (COMPILE-FILE #P"/home/eric/Dropbox/Synapsify/Misc_Lisp_Work/CL_NLP/rutils/core/readtable.lisp" :OUTPUT-FILE #P"/home/eric/.cache/common-lisp/sbcl-1.2.4-linux-x64/home/eric/Dropbox/Synapsify/Misc_Lisp.. 23: (UIOP/UTILITY:CALL-WITH-MUFFLED-CONDITIONS #<CLOSURE (LAMBDA NIL :IN UIOP/LISP-BUILD:COMPILE-FILE) {1002F5FD0B}> NIL) 24: (UIOP/PATHNAME:CALL-WITH-ENOUGH-PATHNAME #P"/home/eric/Dropbox/Synapsify/Misc_Lisp_Work/CL_NLP/rutils/core/readtable.lisp" NIL #<CLOSURE (LAMBDA (UIOP/LISP-BUILD::INPUT-FILE) :IN UIOP/LISP-BUILD:COMPI.. 25: (UIOP/LISP-BUILD:COMPILE-FILE #P"/home/eric/Dropbox/Synapsify/Misc_Lisp_Work/CL_NLP/rutils/core/readtable.lisp" :OUTPUT-FILE #P"/home/eric/.cache/common-lisp/sbcl-1.2.4-linux-x64/home/eric/Dropbox/Sy.. 26: (ASDF/LISP-ACTION:PERFORM-LISP-COMPILATION #<ASDF/LISP-ACTION:COMPILE-OP > #<ASDF/LISP-ACTION:CL-SOURCE-FILE "rutils" "core" "readtable">) 27: ((SB-PCL::EMF ASDF/ACTION:PERFORM) # # #<ASDF/LISP-ACTION:COMPILE-OP > #<ASDF/LISP-ACTION:CL-SOURCE-FILE "rutils" "core" "readtable">) 28: ((:METHOD ASDF/ACTION:PERFORM-WITH-RESTARTS :AROUND (T T)) #<ASDF/LISP-ACTION:COMPILE-OP > #<ASDF/LISP-ACTION:CL-SOURCE-FILE "rutils" "core" "readtable">) [fast-method] 29: ((:METHOD ASDF/PLAN:PERFORM-PLAN (LIST)) ((#<ASDF/LISP-ACTION:COMPILE-OP > . #<ASDF/SYSTEM:SYSTEM "named-readtables">) (#1=#<ASDF/LISP-ACTION:COMPILE-OP > . #2=#<ASDF/LISP-ACTION:CL-SOURCE-FILE #3="ru.. 30: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT)) 31: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) ((#<ASDF/LISP-ACTION:COMPILE-OP > . #<ASDF/SYSTEM:SYSTEM "named-readtables">) (#1=#<ASDF/LISP-ACTION:COMPILE-OP > . #2=#<ASDF/LISP-ACTION:CL-SOURCE-FILE #.. 32: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT)) 33: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) #<ASDF/PLAN:SEQUENTIAL-PLAN {1002E85653}> :VERBOSE NIL) [fast-method] 34: ((:METHOD ASDF/OPERATE:OPERATE (ASDF/OPERATION:OPERATION ASDF/COMPONENT:COMPONENT)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "rutils"> :VERBOSE NIL) [fast-method] 35: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) # # #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "rutils"> :VERBOSE NIL) 36: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE)) 37: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "rutils"> :VERBOSE NIL) [fast-method] 38: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) # # ASDF/LISP-ACTION:LOAD-OP "rutils" :VERBOSE NIL) 39: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE)) 40: (ASDF/CACHE:CALL-WITH-ASDF-CACHE #<CLOSURE (LAMBDA NIL :IN ASDF/OPERATE:OPERATE) {1002E7861B}> :OVERRIDE NIL :KEY NIL) 41: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "rutils" :VERBOSE NIL) [fast-method] 42: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "rutils" :VERBOSE NIL) [fast-method] 43: (QUICKLISP-CLIENT::CALL-WITH-MACROEXPAND-PROGRESS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT::APPLY-LOAD-STRATEGY) {1002E6206B}>) 44: (QUICKLISP-CLIENT::AUTOLOAD-SYSTEM-AND-DEPENDENCIES "rutils" :PROMPT NIL) 45: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION (T T)) # #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1002E4821B}>) [fast-method] 46: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION :AROUND (QL-IMPL:SBCL T)) #<QL-IMPL:SBCL {10076CB093}> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {1002E4821B}>) [fast-me.. 47: ((:METHOD QUICKLISP-CLIENT:QUICKLOAD (T)) # :PROMPT NIL :VERBOSE NIL) [fast-method] 48: (QL-DIST::CALL-WITH-CONSISTENT-DISTS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT:QUICKLOAD) {1002E3536B}>) 49: (SB-INT:SIMPLE-EVAL-IN-LEXENV (QUICKLISP-CLIENT:QUICKLOAD :RUTILS) #) 50: (EVAL (QUICKLISP-CLIENT:QUICKLOAD :RUTILS)) 51: (SWANK::EVAL-REGION "(ql:quickload :rutils) ..) 52: ((LAMBDA NIL :IN SWANK-REPL::REPL-EVAL)) 53: (SWANK-REPL::TRACK-PACKAGE #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1002E3522B}>) 54: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation request." #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1002E3516B}>) 55: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {1002E3514B}>) 56: (SWANK-REPL::REPL-EVAL "(ql:quickload :rutils) ..) 57: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK-REPL:LISTENER-EVAL "(ql:quickload :rutils) ..) 58: (EVAL (SWANK-REPL:LISTENER-EVAL "(ql:quickload :rutils) ..) 59: (SWANK:EVAL-FOR-EMACS (SWANK-REPL:LISTENER-EVAL "(ql:quickload :rutils) ..) --more--

vseloved commented 9 years ago

The problem seems to be in readtable definition via named-readtables.

Can you evaluate the following form before loading cl-nlp: (sb-ext:restrict-compiler-policy 'debug 3), and provide an updated stacktrace? (It should switch on a more detailed output for the stack trace as currently the names of functions are represented simply by # signs).