Various discussions arising from examples and tutorials

vajrabisj commented 1 year ago

when running examples, there are always erros following:

Couldn't find any implementation of MATMULNODE for (LISPTENSOR). [Condition of type CL-WAFFE2/VM.NODES::NODE-NOT-FOUND]

how to address?

btw, your waffe2 system looks great, pls keep it on!

hikettei commented 1 year ago

Hello vajrabisj, thank you for having interest in my project.

For efficient computing, OpenBLAS must be installed to run MatmulNode, (without that, the device CPUTENSOR isn't registered as a valid device).

Before loading cl-waffe2, try running the following code please:

# Ensure that OpenBLAS has been installed in your device
$ sudo apt install libblas

;; Declare the path where you've installed OpenBLAS so that CFFI can use the shared library correctly
(defparameter cl-user::*cl-waffe-config*
    `((:libblas \"libblas.dylib\")))

For details, please visit the documentation: https://hikettei.github.io/cl-waffe2/install/#openblas-backend

btw, your waffe2 system looks great, pls keep it on!

I'm very happy to hear that! Currently, I'm working on reducing the compiling time at a vm-refactoring branch. It's still unstable but in my environment only finishes <0.5s to build CNN network.

vajrabisj commented 1 year ago

it's worth developing a frame work in CL for the AI/ML evovling world.

just suggestion, since your code is in developing, is it possible that you could put more examples from simple to higher level, such as examples on how to use your framework to implement different kind of nn, like the logic gates, like the xor etc.

and also for the inputs, normally we will load input from files, could you also show some examples on that?

thanks for the great work!

hikettei commented 1 year ago

is it possible that you could put more examples

Yes, the lack of examples/documentation is exactly the problem! but I've started this project only two months ago and still developing fundamental features: Automatic Differentation, VM, Multi-threading supports etc...

In fact, here's an example package of training MNIST on MLP/CNN, and data loaders is here, but I feel APIs still have room to be improved. (deftrainer would be much more straightforward in the future release.)

Anyway, I'm keen to enhance the documentation :) Thanks.

and also for the inputs,

Speaking of which, cl-waffe2 AbstractTensor encapsulates the standard array of Common Lisp, so you can just use the change-facet function to create a cl-waffe2 tensor from make-array:

;; depends on:
(use-package :cl-waffe2)
(use-package :cl-waffe2/vm.generic-tensor)

;; No copying
(change-facet
    (make-array `(10 10) :initial-element 1.0)
    :direction 'AbstractTensor)
{CPUTENSOR[float] :shape (10 10)  
  ((1.0 1.0 1.0 ~ 1.0 1.0 1.0)           
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0)   
        ...
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0)
   (1.0 1.0 1.0 ~ 1.0 1.0 1.0))
  :facet :exist
  :requires-grad NIL
  :backward NIL}

That is, cl-waffe2 can be also used with other CL great libraries such as numcl, numericals etc...

vajrabisj commented 1 year ago

Really appreciate your prompt reply!

if I just have input like ‘((0 0)(0 1)(1 0)(1 1)), how is the simplest way to put it into waffe? :)

hikettei commented 1 year ago

In the first place, for numerical computing, I think using a simple array is a much better choice than using a list since the data structure isn't appropriate, and produces a unnecessary coying.

If you want to do the same thing, simply call:

CL-WAFFE2-REPL> (change-facet
         #2A((0 0)
             (1 0)
             (0 1)
             (1 1))
         :direction 'AbstractTensor)
{CPUTENSOR[int32] :shape (4 2)  
  ((0 0)         
   (1 0)   
      ...
   (0 1)
   (1 1))
  :facet :exist
  :requires-grad NIL
  :backward NIL}

(I forgot to say: sparse tensor supports and data casts aren't enough to use. so pass #2A((0.0 0.0) (1.0 0.0) (0.0 1.0) (1.0 1.0)) to obtain float32 tensor).

or making simple-array from a list, later pass to change-facet.

(make-array `(4 2)
    :initial-contents '((0.0 0.0) (0.0 1.0) (1.0 0.0) (1.0 1.0)))

Digging a little deeper: the combination of array type transformations is described here, and extended by overloading the convert-tensor-facet method. Adding list <-> AbstractTensor combination as a standard would be useful. Thanks

vajrabisj commented 1 year ago

This is fantastic! Thx a lot!

vajrabisj commented 1 year ago

i have successfully setup a simple model and run the test!

something still need your kind input:

i know there is predict method, how to use it to show the final predict result after running the training model? when running your example, i found it just stopped after loss function reached to alsmost 0.001, but user may need to show the predict result...
i can run these successfully on linux, but for windows if i setup the libopenblas lib path, the cl-waffe2 will be hanging with not action when doing (ql:quickload :cl-waffe2)... thanks again in advance.

hikettei commented 1 year ago

I know there is predict method, how to use it to show the final prediction result after ...

If you're working with deftrainer, here's a slot :predict to describe how models predict results: here. The method predict corresponds with it.

Another option is that if you can access the trained model, this example would be more straightforward :).

The procedure is a little complicated compared to other libraries. This is because cl-waffe2 needs to be compiled twice, to generate specialised code for training and inference respectively. And I'm trying to find ways to make this much easier to understand for everyone.

i can run these successfully on linux,

Thank you for that! but as for the Windows computers, I don't have a machine so I can't test on it :<

No code dependent on the OS environment is used in cl-waffe2, OpenBLAS are called via CFFI, and as long as working on SBCL, it should work. This might be related to something else.

We've confirmed that cl-waffe2 works in the following environments thanks to @elderica .

macOS Monterey
ubuntu-latest
Arch Linux
FreeBSD

hikettei commented 1 year ago

I'm thinking I might need a tutorial as soon as possible. Which ones would you like to see?

vajrabisj commented 1 year ago

I'm thinking I might need a tutorial as soon as possible. Which ones would you like to see?

that will be great. i think your framework looks great, especially i feel when you go through the current example code such as mlp.lisp, for one who has certain knowledge of ML/NN it will be quite clear such as defsequence, deftrainer and train etc. but for majority of people who may be interested in your code would like to see step by step further to be familiar with every aspect of your code as well as some printing intermidia and/or final results. My humble suggestion that take reference to other ML/NN framework is:

step by step from very simple network, such as i mentioned before those and/or/xor gates implementation to mnist and cnn, and then to other such as RNN etc.
another key aspect is printing, your framework looks quite complete especially one can see the major parameters, nodes, and layers very clear. But by studying your framework around several days i still quite confusion how can i print and/or customize print results to find out where of my training model needs to be updated.

i am still analyzing your framework when i have time, because as a common lisp lover and also i am having plan to implement a good framework onto my other models.

BTW, for your previous mentioned two examples, ie mlp and cnn, i still cannot make myself to proceed the predict function and print out the result...

but anyway, your framework is so far the most systematic and interested one in the common lisp community, pls keep it up! well done!

hikettei commented 1 year ago

Thank you for your valuable suggestions/feedback! In fact, cl-waffe2 is a unique library from other existing libraries, so step-by-step examples will help others learn how to use it quickly. I'll put in my TODO list :)

i still quite confusion how can i print and/or customize print results

I'm sorry but I still don't get what you mean 'print'.... does this work?

((model your-trainer) to read the trained model from your-trainer.)

but anyway, your framework is so far the most systematic and interested one in the common lisp community, pls keep it up! well done!

It's a great honour! The ultimate goal is to create a framework that is comparable to PyTorch and other large-scale libraries in Python over the next few years, all in ANSI Common Lisp. And, the next issue to solve is performance, so I'm spending all my free time on this.

Anyway, thank you for your feedback! Feel free to contact me/make an issue if there are any problems/suggestions.

vajrabisj commented 1 year ago

((model your-trainer) to read the trained model from your-trainer.)

Oh, I mean after training how I can test the model whether it can correctly predict the result? Normally I will feed it new data and then print the predict process and result.

hikettei commented 1 year ago

I see it. Validation is included in the train-and-valid-mlp function. Of course testing data is separated from training data. So 0.9522 is exactly the accuracy of the trained model.

vajrabisj commented 1 year ago

Would like to know what’s difference between the two example in your tutorial:) 6C895F69-E26B-4439-B819-C0133033C485

hikettei commented 1 year ago

That's the difference between AbstractNode and Composite.

AbstractNode

AbstractNode is the smallest/most fundamental unit of the operations in cl-waffe2, and is defined by the defnode macro.

(defnode (AddNode-Revisit (myself)
            :where (A[~] B[~] -> A[~])
            :documentation "A <- A + B"
            :backward ((self dout x y)
                       (declare (ignore x y))
                       (values dout dout))))

and is a CLOS class which mainly has this information:

Graph declaration: :where (A[~] B[~] -> A[~])
The definition of backward, consists of several AbstractNode. (Optional)

However, defnode itself does not define the implementation of forward propagation, but define-impl does for each device.

(define-impl (AddNode-Revisit :device CPUTensor)
         :forward ((self x y)
              `(,@(expand-axpy-form x y)
                          ,x)))

(forward (AddNode-Revisit) (randn `(3 3)) (randn `(3 3))) ;; to invoke the node

Here, the function expand-axpy-form calls the function cl-waffe2/vm.generic-tensor:call-with-view which generates a (loop for ...) iteration for nd-arrays, which follows the optimal route, is parallelized, and (in the future release) is fused loop. Since generation an optimal for(int i=0;i<size;i++){...} route according to the given rank of tensors is one of the main concerns of JIT Compiler for Deep Learning Framework, define-impl is used like a defmacro, and expanded lisp codes are embedded into cl-waffe2 IR. Moreover, it should be denoted that cl-waffe2 VM was created to handle a large number of these AbstractNode.

Composite

Composite is used to describe a set of AbstractNode, and defined by a defmodel macro.

(defmodel (Softmax-Model (self)
       :where (X[~] -> [~])
       :on-call-> ((self x)
               (declare (ignore self))
               (let* ((x1 (!sub x (!mean x  :axis 1 :keepdims t)))
                              (z  (!sum   (!exp x1) :axis 1 :keepdims t)))
                         (!div (!exp x1) z)))))

It originally intended to be used as just a subroutine:

(call (Softmax-Model) (randn `(10 10)))
;; Still keeps lazy-evaluation
(proceed *) ;; to evaluate it

However, in addition, the macro define-composite-function can define functions of Composite for immediate execution. (!sin-static is exactly this).

(define-composite-function (Softmax-Model) !softmax-static)

With :where declaration, cl-waffe2 can trace the network inside Softmax-Model, and the compiled function is defined as !softmax-static.

(!softmax-static (randn `(10 10))) ;; No need to call build/proceed

(!softmax-static (randn `(10 10)))
;; will directly return:
{JITCPUTENSOR[float] :shape (10 10) :named ChainTMP3533 
  ((0.4604313    0.0073007448 0.10543401   ~ 0.087809734  0.031668983  0.028546946)                    
   (0.10716986   0.022830745  0.07476129   ~ 0.24503188   0.2015392    0.07642471)   
                 ...
   (0.015032278  0.028409397  0.12348003   ~ 0.05110904   0.18238431   0.08728184)
   (0.28009242   0.0570261    0.2081007    ~ 0.01599786   0.064734206  0.083274655))
  :facet :input
  :requires-grad NIL
  :backward NIL}

Note that !softmax-static do not produce any backwards/computation nodes, just to define functions that can be used in a Numpy-like way.

hikettei commented 1 year ago

BTW, this issue seems really useful for those who are interested in my framework, so I pinned it :).

vajrabisj commented 1 year ago

Note that !softmax-static do not produce any backwards/computation nodes, just to define functions that can be used in a Numpy-like way

Really thanks for the detailed explanation, very helpful.

so in short, the difference between defnode and defmodel is implementation of backward (regardless base implement)? I mean after define-impl and define-composite-function. Thx.

hikettei commented 1 year ago

implementation of backward

Yes exactly, defmodel isn't intended to define backwards. If you need to define backward and don't feel conformable using defmacro syntax to do this, define-static-node is available (which simply wraps define-impl macro and used like a defun).

vajrabisj commented 1 year ago

Yes exactly, defmodel isn't intended to define backwards. If you need to define backward and don't feel conformable using defmacro syntax to do this, define-static-node is available (which simply wraps define-impl macro and used like a defun).

Just curious, why not put all these forward backward etc simply in defnode together? :)

hikettei commented 1 year ago

I wanna keep the definition of defnode and defmodel simple and generic as possible. If there is a lack of functionality, one can implement anything by wrapping it in another macro later on. Plus, defnode is completely lowest-level in that codes are directly embedded into cl-waffe2 IR. Making everything defnode may cause difficulty in debugging.

hikettei commented 1 year ago

If one wants to implement a new operator (e.g.: !add !sin !mul), this should be implemented as AbstractNode with calling external libraries. Plus, common specifications are declared for AbstractNode. That is, whoever implements AddNode, can only destructively add the value of the second argument to the first argument. it would be useful when extending to CUDA and Metal in the future release.

hikettei / cl-waffe2

Various discussions arising from examples and tutorials #71

AbstractNode

Composite