starpu-runtime / starpu

This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!
https://starpu.gitlabpages.inria.fr/
GNU Lesser General Public License v2.1
58 stars 13 forks source link

starpu_free_flags might produce UB on nullptr #45

Closed yulikdaniel closed 3 months ago

yulikdaniel commented 4 months ago

Steps to reproduce

This is tricky. We managed to create this bug in a reproducible manner in our big project, but I don't seem to be able to create a small example. However, I can't share the whole source code for our project since it's a homework project and the source code is closed. So here is a snippet from our project that is enough to understand this bug:

template <typename DataType>
Tile<DataType>::Tile(size_t _rows, size_t _cols) : rows(_rows), cols(_cols) {
    starpu_malloc_flags((void**)&data, rows * cols * sizeof(DataType), STARPU_MALLOC_COUNT);
}

template <typename DataType>
Tile<DataType>::~Tile() {
    // if (data != nullptr)
    starpu_free_flags(data, rows * cols * sizeof(DataType), STARPU_MALLOC_COUNT);
}

template <typename DataType>
Tile<DataType>& Tile<DataType>::operator=(Tile&& other) {
    rows = other.rows;
    cols = other.cols;
    data = other.data;
    other.data = nullptr;

    return *this;
}

Obtained behavior

Our tests fail every with the code above, but work every time when uncommenting the line "if (data != nullptr)". We conclude that starpu_free_flags breaks on nullptr.

The error when tests fail is as follows:

[starpu][kyle07][_starpu_memory_reclaim_generic] Not enough memory left on node NUMA 0. Your application data set seems too huge to fit on the device, StarPU will cope by trying to purge 1 MiB out. This message will not be printed again for further purges. You may want to tune the STARPU_MINIMUM_CLEAN_BUFFERS and STARPU_TARGET_CLEAN_BUFFERS environment variables up a bit to make StarPU maintain more clean memory available, to avoid ending up in this situation.
Segmentation fault (core dumped)

Expected behavior

We expected starpu_free_flags to do nothing when called on nullptr, as per c++ standart for free/delete. https://en.cppreference.com/w/c/memory/free

Configuration

../configure --prefix=$INSTALL_DIR/starpu --disable-opencl --disable-build-doc --disable-build-examples --disable-build-test --enable-parallel-worker --enable-mpi --enable-omp;

Configuration result

config.log

Distribution

Starpu 1.4.5, also reproducible on earlier versions

Version of StarPU

9eedd6151da288bdc1cfbaf065648886c7d92c5a

nfurmento commented 3 months ago

Hello,

i tried to reprduce your bug but failed to do so.

In the code of the function starpu_free_flags, from what i saw, the only call which could fail with a nullptr is cudaFree. however a similar call is made in the initialisation of the cuda device with cudaFree(0). and that call seems to work in your application. i guess nullptr is the same thing as 0.

I will keep investigating.

thanks,

Nathalie

sthibaul commented 3 months ago

actually cudaHostFree? There is also possibly hipHostFree, or hwloc_free, or even free_hook, if the user provided one. I guess adding a if(!A) return 0; would be safer.

nfurmento commented 3 months ago

The problem has been fixed in the master branch as well as in the 1.4 branch, and will be available in github tomorrow.