tensor-compiler / taco

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
http://tensor-compiler.org
Other
1.25k stars 188 forks source link

build kernel failed #259

Open dongxiao92 opened 5 years ago

dongxiao92 commented 5 years ago

I use taco to generate and compile a sparse tensor dense tensor multiply kernel and evaluate its performance with inputs in various shapes. But a compilation error occurs sometimes. The error is listed below:

Error at /home/dongxiao/working/taco/src/codegen/module.cpp:154 in compile: Compilation command failed: cc -O3 -ffast-math -std=c99 -shared -fPIC /tmp/taco_tmp_JoX7bb/qarcvemqfnu1.c /tmp/taco_tmp_JoX7bb/qarcvemqfnu1_shims.c -o /tmp/taco_tmp_JoX7bb/qarcvemqfnu1.so returned -1

But I try to execute the printed command and it succeed. Thus I'm confused about what the problem is and how it happens. The code is listed below.

           //define the format for input data
            Format input_fmt({Sparse, Sparse, Dense});
            //define the weight format
            Format weight_fmt({Dense, Dense});
            //define the format for output
            Format output_fmt({Sparse, Sparse, Dense});

            //Create tensors
            Tensor<float> input(h,w,c}, input_fmt);
            Tensor<float> weight({k, c}, weight_fmt);
            Tensor<float> output({h, w, k}, output_fmt);

            input.pack();
            weight.pack();

            //Form the computation expression
            IndexVar oh, ow, ok, ic; 
            output(oh, ow, ok) = input(oh, ow, ic)*weight(ok, ic);

            //Compile the expression
            output.compile();
            output.assemble();

            auto start = getTime();
            //Compute the result
            output.compute();
            auto end = getTime();
dongxiao92 commented 5 years ago

I have checked the return value which is -1. According to the linux man page, -1 indicates the failure of creating child process which is used to execute the command. But I'm not sure why this happens.

dongxiao92 commented 5 years ago

I further checked the errno set by system. The errno is 12, and the corresponding message is 'Cannot allocate memory'. But I monitored the physical memory usage by top and it never surpassed 15%. The swap space has not been used yet. It seems calling 'system' to compile generated code too many times consecutively may result some problem.

stephenchouca commented 5 years ago

Currently, taco will generate a shared library with the generated code whenever compile is called and then dynamically load it using dlopen, which apparently reserves a significant amount of virtual memory per shared library. To properly fix this would probably require just replacing the whole shared library approach with LLVM JIT, which is currently a work in progress.

You should be able to work around the issue though if you enable memory overcommitting on your system; on Linux, you can do this by running sudo sh -c "echo 1 > /proc/sys/vm/overcommit_memory". Alternatively, instead of running all your tests in a single run of your benchmarking application, you can try splitting them into batches and run the batches separately in different invokations of your application, which should allow the unused virtual memory to be reclaimed periodically.

dongxiao92 commented 5 years ago

Currently, taco will generate a shared library with the generated code whenever compile is called and then dynamically load it using dlopen, which apparently reserves a significant amount of virtual memory per shared library. To properly fix this would probably require just replacing the whole shared library approach with LLVM JIT, which is currently a work in progress.

You should be able to work around the issue though if you enable memory overcommitting on your system; on Linux, you can do this by running sudo sh -c "echo 1 > /proc/sys/vm/overcommit_memory". Alternatively, instead of running all your tests in a single run of your benchmarking application, you can try splitting them into batches and run the batches separately in different invokations of your application, which should allow the unused virtual memory to be reclaimed periodically.

Thanks for your reply. I used your method to work around.