microsoft / knossos-ksc

Compiler with automatic differentiation
Other
45 stars 10 forks source link

Bug: Segmentation fault in sqrl_pytorch-PyTorch CUDA #1026

Closed awf closed 2 years ago

awf commented 2 years ago

Just saw this while working on something else. I haven't done a lot to debug it, but note that it's in copydown, on a fairly innocuous operation (aten::sum(Tensor 2) -> Float), so might be something to do with KS_ALLOCATOR not being defined? Or could just be out of memory not caught? image

awf commented 2 years ago

Easiest way to replicate the situation above is to edit launch.json to include

        {
            "name": "(gdb) pytest",
            "type": "cppdbg",
            "request": "launch",
            "program": "/anaconda/envs/knossos/bin/python",
            "args": [
                "-m",
                "pytest",
                "src/bench/",
                "-v",
                "--modulepath=examples/dl-capsule/sqrl",
                "--benchmarkname=sqrl",
            ],
            "stopAtEntry": false,
            "cwd": "${workspaceFolder}",
            "environment": [
                {"name":"PYTHONPATH", "value":"./src/python"}
            ],
            "externalConsole": false,
            "MIMode": "gdb",
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
                }
            ]
        },

And then "Debug: Select and Start Debugging" in VS Code, picking "(gdb) pytest".

dcrc2 commented 2 years ago

The problem is that we have

@knossos.register
def sqrl(x: torch.Tensor):
    ...

def sqrl_pytorch(x: torch.Tensor):
    return sqrl(x)

which means that sqrl_pytorch isn't actually a PyTorch implementation at all: it calls the Knossos implementation. I think this was accidentally broken by the addition of the knossos.register decorator in #960. We'll need to rewrite sqrl_pytorch so that it's a genuine PyTorch implementation.

Before #976 was merged this morning, functions defined using @knossos.register were compiled for CPU only; but the "PyTorch CUDA" benchmark puts the input tensors on the GPU. The segmentation fault occurs when trying to read this data on the CPU.

After #976 is merged, the KscStub detects that the input is on the GPU and tries to compile for the GPU, but this raises an error ("Only elementwise operations can be compiled for GPU"), which I think is the correct behaviour. There is no "Knososs CUDA" benchmark for sqrl, because the "Knossos CUDA" benchmark is only enabled for elementwise operations.