mmaul / clml

Common Lisp Machine Learning Library
Other
259 stars 36 forks source link

make-decision-tree breaks with single-float data on SBCL #31

Closed neil-lindquist closed 6 years ago

neil-lindquist commented 6 years ago

CLML throws SB-SYS:MEMORY-FAULT-ERROR in sbcl when trying to make a decision tree with data loaded from a csv file as float or single-float. I've gotten this on both SBCL 1.4.2 under Windows 10 and SBCL 1.4.0-1.el7 under Red Hat Linux 7.4.

The issue is at https://github.com/mmaul/clml/blob/master/decision-tree/src/decision-tree.lisp#L40 since a single-float (or other versions of float) will pass floatp, but then is declared to be type double-float with the safety optimization flag set to 0. This results in a single-float being used as a double-float without any checking and causing an error.

example.lisp:

(defparameter *data* (clml.hjs.read-data:read-data-from-file "iris.csv"
                           :type :csv
                           :csv-type-spec '(single-float string)))

(format t "~S~%~%" *data*)

(format t "~S" (clml.decision-tree.decision-tree:make-decision-tree *data*  "class"))

iris.csv:

sepallength,class
5.1,IrisSetosa
5.9,IrisVirginica

repl:

$ sbcl
This is SBCL 1.4.0-1.el7, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
* (ql:quickload :clml)
To load "clml":
  Load 1 ASDF system:
    clml
; Loading "clml"
.....
(:CLML)
* (load "example.lisp")
#<CLML.HJS.READ-DATA:UNSPECIALIZED-DATASET >
DIMENSIONS: sepallength | class
TYPES:      UNKNOWN | UNKNOWN
NUMBER OF DIMENSIONS: 2
DATA POINTS: 2 POINTS

CORRUPTION WARNING in SBCL pid 2904(tid 0x7ffff7fc0740):
Memory fault at (nil) (pc=0x22da27fe, sp=0x7ffff09668f8)
The integrity of this image is possibly compromised.
Continuing with fingers crossed.
While evaluating the form starting at line 4, column 0
  of #P"/net/home/nslindquist/temp/example.lisp":

debugger invoked on a SB-SYS:MEMORY-FAULT-ERROR in thread
#<THREAD "main thread" RUNNING {100194E703}>:
  Unhandled memory fault at #x0.

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RETRY   ] Retry EVAL of current toplevel form.
  1: [CONTINUE] Ignore error and continue loading file "/net/home/nslindquist/temp/example.lisp".
  2: [ABORT   ] Abort loading file "/net/home/nslindquist/temp/example.lisp".
  3:            Exit debugger, returning to top level.

(SB-SYS:MEMORY-FAULT-ERROR)
0] backtrace

Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {100194E703}>
0: (SB-SYS:MEMORY-FAULT-ERROR)
1: ("foreign function: call_into_lisp")
2: ("foreign function: post_signal_tramp")
3: ((LAMBDA (CLML.DECISION-TREE.DECISION-TREE::X) :IN CLML.DECISION-TREE.DECISION-TREE:MAKE-SPLIT-PREDICATE) #<unavailable argument>) [external]
4: (CLML.DECISION-TREE.DECISION-TREE::AUX-SPLIT #(#(5.1 "IrisSetosa") #(5.9 "IrisVirginica")) #<HASH-TABLE :TEST EQUAL :COUNT 2 {1005392F23}> (0 1) "sepallength" 5.1)
5: (CLML.DECISION-TREE.DECISION-TREE:DELTA-GINI #(#(5.1 "IrisSetosa") #(5.9 "IrisVirginica")) #<HASH-TABLE :TEST EQUAL :COUNT 2 {1005392F23}> (0 1) "sepallength" 5.1 1)
6: (CLML.DECISION-TREE.DECISION-TREE::SELECT-BEST-SPLITTING-ATTRIBUTE #(#(5.1 "IrisSetosa") #(5.9 "IrisVirginica")) #<HASH-TABLE :TEST EQUAL :COUNT 2 {1005392F23}> (0 1) (("sepallength" . 5.1)) 1 :TEST #<FUNCTION CLML.DECISION-TREE.DECISION-TREE:DELTA-GINI> :EPSILON 0)
7: (CLML.DECISION-TREE.DECISION-TREE::MAKE-ROOT-NODE #(#(5.1 "IrisSetosa") #(5.9 "IrisVirginica")) #<HASH-TABLE :TEST EQUAL :COUNT 2 {1005392F23}> 1 :TEST #<FUNCTION CLML.DECISION-TREE.DECISION-TREE:DELTA-GINI> :EPSILON 0)
8: (CLML.DECISION-TREE.DECISION-TREE:MAKE-DECISION-TREE #<CLML.HJS.READ-DATA:UNSPECIALIZED-DATASET >
DIMENSIONS: sepallength| class

TYPES:      UNKNOWN| UNKNOWN

NUMBER OF DIMENSIONS: 2

DATA POINTS: 2 POINTS
 "class" :TEST NIL :EPSILON 0)
9: (SB-INT:SIMPLE-EVAL-IN-LEXENV (CLML.DECISION-TREE.DECISION-TREE:MAKE-DECISION-TREE *DATA* "class") #<NULL-LEXENV>)
10: (SB-INT:SIMPLE-EVAL-IN-LEXENV (FORMAT T "~S" (CLML.DECISION-TREE.DECISION-TREE:MAKE-DECISION-TREE *DATA* "class")) #<NULL-LEXENV>)
11: (EVAL-TLF (FORMAT T "~S" (CLML.DECISION-TREE.DECISION-TREE:MAKE-DECISION-TREE *DATA* "class")) 2 NIL)
12: ((LABELS SB-FASL::EVAL-FORM :IN SB-INT:LOAD-AS-SOURCE) (FORMAT T "~S" (CLML.DECISION-TREE.DECISION-TREE:MAKE-DECISION-TREE *DATA* "class")) 2)
13: ((LAMBDA (SB-KERNEL:FORM &KEY :CURRENT-INDEX &ALLOW-OTHER-KEYS) :IN SB-INT:LOAD-AS-SOURCE) (FORMAT T "~S" (CLML.DECISION-TREE.DECISION-TREE:MAKE-DECISION-TREE *DATA* "class")) :CURRENT-INDEX 2)
14: (SB-C::%DO-FORMS-FROM-INFO #<CLOSURE (LAMBDA (SB-KERNEL:FORM &KEY :CURRENT-INDEX &ALLOW-OTHER-KEYS) :IN SB-INT:LOAD-AS-SOURCE) {1004DCCB0B}> #<SB-C::SOURCE-INFO {1004DCCAC3}> SB-C::INPUT-ERROR-IN-LOAD)
15: (SB-INT:LOAD-AS-SOURCE #<SB-INT:FORM-TRACKING-STREAM for "file /net/home/nslindquist/temp/example.lisp" {1004DCAD23}> :VERBOSE NIL :PRINT NIL :CONTEXT "loading")
16: ((FLET SB-FASL::THUNK :IN LOAD))
17: (SB-FASL::CALL-WITH-LOAD-BINDINGS #<CLOSURE (FLET SB-FASL::THUNK :IN LOAD) {7FFFF09676AB}> #<SB-INT:FORM-TRACKING-STREAM for "file /net/home/nslindquist/temp/example.lisp" {1004DCAD23}>)
18: ((FLET SB-FASL::LOAD-STREAM :IN LOAD) #<SB-INT:FORM-TRACKING-STREAM for "file /net/home/nslindquist/temp/example.lisp" {1004DCAD23}> NIL)
19: (LOAD "example.lisp" :VERBOSE NIL :PRINT NIL :IF-DOES-NOT-EXIST T :EXTERNAL-FORMAT :DEFAULT)
20: (SB-INT:SIMPLE-EVAL-IN-LEXENV (LOAD "example.lisp") #<NULL-LEXENV>)
21: (EVAL (LOAD "example.lisp"))
22: (INTERACTIVE-EVAL (LOAD "example.lisp") :EVAL NIL)
23: (SB-IMPL::REPL-FUN NIL)
24: ((LAMBDA NIL :IN SB-IMPL::TOPLEVEL-REPL))
25: (SB-IMPL::%WITH-REBOUND-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-IMPL::TOPLEVEL-REPL) {1002B6C07B}>)
26: (SB-IMPL::TOPLEVEL-REPL NIL)
27: (SB-IMPL::TOPLEVEL-INIT)
28: ((FLET "WITHOUT-INTERRUPTS-BODY-35" :IN SAVE-LISP-AND-DIE))
29: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))

0] 
mmaul commented 6 years ago

TY for the pull request...merged.

CLML is however strongly biased toward double floats, so you may encounter more issues like this depending on where you step.