goplus / gop

The Go+ programming language is designed for engineering, STEM education, and data science. Our vision is to enable everyone to become a builder of the digital world.
https://goplus.org
Apache License 2.0
8.91k stars 546 forks source link

numgo+ v.s. numpy #307

Open wangkuiyi opened 4 years ago

wangkuiyi commented 4 years ago

From my experience with PaddlePaddle and ElasticDL, I believe it's necessary to have a new programming language that can replace Python and Go+ is on the right track.

In hope of helping to establish a community, I am trying to make the following program working so we could have something like numgo+ that mimics Python+numpy.

import (
    "fmt"
    "strings"
    "gonum.org/v1/gonum/mat"
)

func NewMat(x [][]float64) *mat.Dense {
    if len(x) <= 0 {
        log.Fatalf("NewMat expects a 2D float64 non-empty slice. However, len(x)=%d", len(x))
    }
    return mat.NewDense(len(x), len(x[0]))
}

func EmptyMat(h, w int) *mat.Dense {
    return mat.NewDense(h, w, nil)
}

m := [[1,2],
      [3,4],
      [5,6]]  // [][]int

a := NewMat(m)
b := NewMat(m)
c := EmptyMat(a.Dims())
c.Mul(a, b)

Currently, qrun . panics with toExternalType: todo. It seems due to the incompleteness of Go+.

https://github.com/qiniu/goplus/blob/f820876458b4ff3fb6aee46df17e0e2a1890e337/cl/type_decl.go#L228-L230

I will try to contribute to make this work. If anyone else could run faster than me, it would be highly appreciated.

xushiwei commented 4 years ago

Cool.

xushiwei commented 4 years ago

I suggest you splitting your work into small enhancements so that you can make pull request frequently. And this will make our cooperation smoothly.

wangkuiyi commented 4 years ago

I suggest you splitting your work into small enhancements so that you can make pull request frequently.

Sure, I will.

I am still learning your source code, trying to understand the typing thing. Once I am ready, I will file a design PR for your review before coding. With the design confirmed, I will file a sequence of small PRs to change the source code.

wangkuiyi commented 4 years ago

Proposal: numgo+ and GoTorch Basing on Go+

Go+ simplifies Go syntax in a way that is good for data science. To prosper the idea and make it into a society, I propose to found new projects numgo+ and gotorch just like numpy and PyTorch built upon Python.

Python-based stack Go+-based stack
PyTorch GoTorch
numpy numgo+
Python Go+

According to my experience as a former leader of Baidu PaddlePaddle and a senior staff data scientist at LinkedIn, I personally would prefer the Go+-based tech stack for reasons:

numgo+

This document focuses on numgo+, which, like numpy provides the basic data type for PyTorch, could be the basis of the proposed GoTorch.

At the heart of numpy is a data type ndarray that encapsulate a tensor. I propose to have numgoplus.ndarray as a counterpart which has compatible API to ease the migration from numpy to numgo+.

Array Creation

The following example comes from the official numpy tutorial. The proposed numgo+ counterpart is to the right.

```python import numpy as np a = np.arange(15).reshape(3, 5) print(a) # array([[ 0, 1, 2, 3, 4], # [ 5, 6, 7, 8, 9], # [10, 11, 12, 13, 14]]) print(a.shape) # (3, 5) print(a.ndim) # 2 print(a.dtype.name) # 'int64' print(a.itemsize) # 8 print(a.size) # 15 print(type(a)) # b = np.array([6, 7, 8]) print(b) # array([6, 7, 8]) print(type(b)) # ``` ```go import ng "github.com/goplus/numgoplus" a := ng.arange(15).reshape(3, 5) fmt.Println(a) // array([[ 0, 1, 2, 3, 4], // [ 5, 6, 7, 8, 9], // [10, 11, 12, 13, 14]]) fmt.Println(a.Shape()) // (3, 5) fmt.Println(a.Ndim()) // 2 fmt.Println(a.Dtype().Name()) // 'int64' fmt.Println(a.Itemsize()) // 8 fmt.Println(a.Size()) // 15 fmt.Println(reflect.TypeOf(a)) // numgoplus.ndarray b = ng.array([6, 7, 8]) fmt.Println(b) // array([6, 7, 8]) fmt.Println(reflect.TypeOf(b)) // numgoplus.ndarray ```

Array Literals

In Go, we write array/slice literals with type explicitly.

a := [][]float64{
    {1.0, 2.0, 3.0},
    {1.0, 2.0, 3.0}}

Go+ can automatically derive the type from the element literals, thus enables a much easier way that looks like Python.

a := [[1.0, 2.0, 3.0],
      [1.0, 2.0, 3.0]]

This enables numgo+ an API of literal arrays like numpy.

```python b = np.array([[1.5,2,3], [4,5,6]]) # array([[1.5, 2. , 3. ], # [4. , 5. , 6. ]]) ``` ```go b := ng.array([[1.5,2,3], [4,5,6]]) // array([[1.5, 2. , 3. ], // [4. , 5. , 6. ]]) ```
model-collapse commented 4 years ago

I came up with same idea and when you post this, I was having my breakfast.

xushiwei commented 4 years ago

github.com/goplus/numgoplus => github.com/numgoplus/ng

xushiwei commented 4 years ago

In Go+ we will support a feature named auto property. It means:

import ng "github.com/goplus/numgoplus"

a := ng.arange(15).reshape(3, 5)
fmt.Println(a)
// array([[ 0,  1,  2,  3,  4],
//        [ 5,  6,  7,  8,  9],
//        [10, 11, 12, 13, 14]])
fmt.Println(a.Shape()) // (3, 5)
fmt.Println(a.Ndim())  // 2
fmt.Println(a.Dtype().Name()) // 'int64'
fmt.Println(a.Itemsize()) // 8
fmt.Println(a.Size()) // 15
fmt.Println(reflect.TypeOf(a)) // numgoplus.Ndarray

b = ng.array([6, 7, 8])
fmt.Println(b)
// array([6, 7, 8])
fmt.Println(reflect.TypeOf(b)) // numgoplus.Ndarray

can be:

import "github.com/numgoplus/ng"

a := ng.arange(15).reshape(3, 5)
println(a)
// array([[ 0,  1,  2,  3,  4],
//        [ 5,  6,  7,  8,  9],
//        [10, 11, 12, 13, 14]])
println(a.shape) // (3, 5)
println(a.ndim)  // 2
println(a.dtype.name) // 'int64'
println(a.itemsize) // 8
println(a.size) // 15
println(reflect.typeOf(a)) // ng.Ndarray

b = ng.array([6, 7, 8])
println(b)
// array([6, 7, 8])
println(reflect.typeOf(b)) // ng.Ndarray
xushiwei commented 4 years ago

In Go+ we have simplified form of 2d vector. It means:

b := ng.array([[1.5,2,3], [4,5,6]])
// array([[1.5, 2. , 3. ],
//        [4. , 5. , 6. ]])

can be:

b := ng.array([1.5,2,3; 4,5,6])
// array([[1.5, 2. , 3. ],
//        [4. , 5. , 6. ]])
wangkuiyi commented 4 years ago

The simplified form of 2D vectors looks a step further than Python. It is close to MATLAB syntax. Great idea!

wangkuiyi commented 4 years ago

Here is a typical PyTorch program in four different languages:

C++ Go
```c++ #include #include "torch/script.h" #include "torch/optim.h" int main() { int N = 64, D_in = 1000, H = 100, D_out = 10; double learning_rate = 1e-3; auto x = torch::randn({N, D_in}, at::TensorOptions().requires_grad(false)); auto y = torch::randn({N, D_out}, at::TensorOptions().requires_grad(false)); // The Adam optimizer wants parameters in a std::vector. std::vector params = { torch::randn({D_in, H}, at::TensorOptions().requires_grad(true)), torch::randn({H, D_out}, at::TensorOptions().requires_grad(true))}; // Build the optimizer. torch::optim::Adam adam(params, torch::optim::AdamOptions(learning_rate)); // Make quick references for using in the forward pass. const at::Tensor & w1 = adam.parameters()[0]; const at::Tensor & w2 = adam.parameters()[1]; for (int i = 0; i < 500; ++i) { auto y_pred = at::mm(at::clamp(at::mm(x, w1), 0), w2); auto loss = at::sum(at::pow(at::sub(y_pred, y), 2)); if ((i % 100) == 99) { std::cout << "loss = " << loss << std::endl; } adam.zero_grad(); loss.backward(); adam.step(); } return 0; } ``` ```go package main import ( "fmt" at "github.com/gotorch/gotorch/aten" "github.com/gotorch/gotorch/torch" "github.com/gotorch/gotorch/torch/optim" ) func main() { N, D_in, H, D_out := 64, 1000, 100, 10 learning_rate := 1e-3 x := torch.RandN([]int{N, Din}, at.TensorOptions().RequiresGrad(false)) y := torch.RandN([]int{N, Dout}, at.TensorOptions().RequiresGrad(false)) params := []at.Tensor{ torch.RandN([]int{Din, H}, at.TensorOptions().RequiresGrad(true)), torch.RandN([]int{H, Dout}, at.TensorOptions().RequiresGrad(true)), } adam := optim.NewAdam(params, optim.AdamOptions(learning_rate)) w1 := adam.parameters()[0] w2 := adam.parameters()[1] for i := 0; i < 500; i++ { y_pred := at.Sum(at.Clamp(at.MM(x, w1), 0), w2) loss := at.Sum(at.Pow(at.Sub(y_pred, y), 2)) if i%100 == 0 { fmt.Println("loss = ", loss) } adam.ZeroGrad() loss.Backward() adam.Step() } } ```
Go+Python
```go package main import ( "fmt" "github.com/gotorch/gotorch/at" "github.com/gotorch/gotorch/torch" "github.com/gotorch/gotorch/torch/optim" ) func main() { N, D_in, H, D_out := 64, 1000, 100, 10 x := torch.RandN(N, Din, requires_grad=False) y := torch.RandN(N, Dout, requires_grad=False) w1 := torch.randn(D_in, H, requires_grad=True) w2 := torch.randn(H, D_out, requires_grad=True) learning_rate := 1e-3 adam := optim.NewAdam([w1, w2], lr=learning_rate) for i := 0; i < 500; i++ { y_pred := at.Sum(at.Clamp(at.MM(x, w1), 0), w2) loss := at.Sum(at.Pow(at.Sub(y_pred, y), 2)) if i%100 == 0 { fmt.Println("loss = ", loss) } adam.ZeroGrad() loss.Backward() adam.Step() } } ``` ```python import torch N, D_in, H, D_out = 64, 1000, 100, 10 x = torch.randn(N, D_in, requires_grad=False) y = torch.randn(N, D_out, requires_grad=False) w1 = torch.randn(D_in, H, requires_grad=True) w2 = torch.randn(H, D_out, requires_grad=True) learning_rate = 1e-3 adam = torch.optim.Adam([w1, w2], lr=learning_rate) for t in range(500): y_pred = x.mm(w1).clamp(min=0).mm(w2) loss = (y_pred - y).pow(2).sum() if t % 100 == 99: print(t, loss.item()) adam.zero_grad() loss.backward() adam.step() ```

From the above four programs, we can see

  1. The Go binding could be as effective as the C/C++ API, in terms of the number of lines of source code.
  2. If we want the Go+ version as short/concise as the Python version, the primary requirement to the Go+ transpiler is to support named function parameters. For example, x := torch.RandN(N, Din, requires_grad=False).
xushiwei commented 4 years ago

In Go+, we can write as the following:

package main

import (
    "fmt"

    "github.com/gotorch/gotorch/at"
    "github.com/gotorch/gotorch/torch"
    "github.com/gotorch/gotorch/torch/optim"
)

N, Din, H, Dout := 64, 1000, 100, 10

x := torch.RandN(N, Din, {})
y := torch.RandN(N, Dout, {})

w1 := torch.RandN(Din, H, {RequiresGrad: true})
w2 := torch.RandN(H, Dout, {RequiresGrad: true})

learningRate := 1e-3
adam := optim.NewAdam([w1, w2], {LR: learningRate})

for i := 0; i < 500; i++ {
    yPred := at.Sum(at.Clamp(at.MM(x, w1), 0), w2)
    loss := at.Sum(at.Pow(at.Sub(yPred, y), 2))

    if i%100 == 0 {
        fmt.Println("loss = ", loss)
    }

    adam.ZeroGrad()
    loss.Backward()
    adam.Step()
}

New language features:

shendiaomo commented 4 years ago

Here is a typical PyTorch program in four different languages:

  • The Python version comes from the official tutorial.
  • The C++ version calls the ATen C library and Torch's csrc C++ library. Thanks to Jia-Kai Liu, a tech lead of PyTorch, for teaching me everything about the C/C++ core of PyTorch. Please follow instructions in https://github.com/wangkuiyi/cxxtorch to run this program. From the above four programs, we can see
  1. The Go binding could be as effective as the C/C++ API, in terms of the number of lines of source code.
  2. If we want the Go+ version as short/concise as the Python version, the primary requirement to the Go+ transpiler is to support named function parameters. For example, x := torch.RandN(N, Din, requires_grad=False).

There's still a problem:libtorch uses exceptions as the main error handling mechanism, this causes two consequences:

  1. We have to find a way to pass C++ exception to Go, as described in https://artem.krylysov.com/blog/2017/04/13/handling-cpp-exceptions-in-go/
    • Can Go+ provide a more efficient way to to the same thing?
  2. Take error handling into consideration, in terms of the number of lines of source code, the Go binding maybe not as effective as the C/C++ API because we have to check the error status of each line.
    • Go+ has already provided a neat error handling syntax, how can we leverage the Go+ mechanism to simplify error handling of the Go binding?
xushiwei commented 4 years ago

There's still a problem:libtorch uses exceptions as the main error handling mechanism, this causes two consequences:

  1. We have to find a way to pass C++ exception to Go, as described in https://artem.krylysov.com/blog/2017/04/13/handling-cpp-exceptions-in-go/

    • Can Go+ provide a more efficient way to to the same thing?
  2. Take error handling into consideration, in terms of the number of lines of source code, the Go binding maybe not as effective as the C/C++ API because we have to check the error status of each line.

    • Go+ has already provided a neat error handling syntax, how can we leverage the Go+ mechanism to simplify error handling of the Go binding?

Define error type: CppError

type CppError struct {
    what string
}

func (p *CppError) Error() string {
    return p.what
}

func NewCppError(what string) error {
    return &CppError{what: what}
}

Wrap functions with C++ exception

/*
OutputArgs XXX_Wrap(InputArgs input, pwhat *GoString) {
    try {
        return XXX(input);
    } catch(std::exception &e) {
        *pwhat = C.GoString(e.what());
        return OutputArgs();
    }
}
*/
import "C"

func XXX(input InputArgs) (output OutputArgs, err error) {
    var what string
    output = C.XXX_Wrap(input, &what)
    if what != "" {
        err = NewCppError(what)
    }
    return
}