starpu-runtime / starpu

This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!
https://starpu.gitlabpages.inria.fr/
GNU Lesser General Public License v2.1
58 stars 13 forks source link

Data invalidation and deinitialization do not signal about read dependency in a task afterwards in certain cases #39

Open Muxas opened 5 months ago

Muxas commented 5 months ago

Hi!

I have just added starpu_data_invalidate_submit to my code. Of course, I did it with mistakes. Some cases were reported by StarPU, signaling that some data is not initialized to be read. But some cases were not. I found out, that marking data with starpu_data_set_reduction_methods makes it immune to such an assert if the data remains on the same device and its access mode is STARPU_R.

If the data is on GPU and reduction methods only support CPUs, then the following error is printed:

/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_redux_init_data_replicate+0x208)[0x155554ecd3d8]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_fetch_task_input_tail+0x210)[0x155554eb3ac0]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_cuda_driver_run_once+0x22e)[0x155554f1b72e]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_cuda_worker+0x95)[0x155554f1c155]
/lib64/libpthread.so.0(+0x7ea5)[0x155554bfaea5]
/lib64/libc.so.6(clone+0x6d)[0x155554403b0d]
a.out: ../../src/datawizard/reduction.c:92: _starpu_redux_init_data_replicate: Assertion `0 && "init_func"' failed.
Aborted

However, if the reduction methods are supported on device, where the data is allocated, then program is not stopped, no error is thrown and result of computations becomes silently wrong (undefined behavior).

Here is a simple program to reproduce:

#include <starpu.h>
#include <stdio.h>

void set_func(void *buffers[], void *cl_args)
{
    float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    *x = 0;
    printf("set_func\n");
}

void use_func(void *buffers[], void *cl_args)
{
    float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    float one = 1.0;
    printf("use_func\n");
}

void clear_func(void *buffers[], void *cl_args)
{
    float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    *x = 0.0;
}

void acc_func(void *buffers[], void *cl_args)
{
    float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    const float *y = (const float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    *x += *y;
}

struct starpu_codelet set_codelet =
{
    .cpu_funcs = {set_func},
    .modes = {STARPU_W},
    .nbuffers = 1
};

struct starpu_codelet use_codelet =
{
    .cpu_funcs = {use_func},
    .modes = {STARPU_R},
    .nbuffers = 1
};

struct starpu_codelet clear_codelet =
{
    .cpu_funcs = {clear_func},
    .modes = {STARPU_W},
    .nbuffers = 1
};

struct starpu_codelet acc_codelet =
{
    .cpu_funcs = {acc_func},
    .modes = {STARPU_RW, STARPU_R},
    .nbuffers = 2
};

int main(int argc, char **argv)
{
    float x;
    starpu_data_handle_t x_handle;
    starpu_init(NULL);
    starpu_variable_data_register(&x_handle, STARPU_MAIN_RAM, (uintptr_t)&x,
            sizeof(x));
    starpu_data_set_reduction_methods(x_handle, &acc_codelet, &clear_codelet);
    starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X
    starpu_data_invalidate_submit(x_handle); // Invalidate X
    //starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X is ignored
    starpu_task_insert(&use_codelet, STARPU_R, x_handle, 0); // Read X right after invalidation without an error
    starpu_task_wait_for_all();
    starpu_data_unregister(x_handle);
    starpu_shutdown();
    printf("x=%f\n", x);
}
sthibaul commented 5 months ago

I'm not sure how to use your reproducer since it only puts zeroes in the value in cpu memory. But I guess it's the missing initialized = 0 case that the starpu-1.3 branch indeed didn't have, and that was indeed uncovered by this new situation (in the past we wouldn't have replicates that are allocated but not initialized or planned to be). I have pushed a fix to the starpu-1.3 branch.

Muxas commented 5 months ago

The fix did not solve the problem. I believe you did not catch what I was trying to convey. You do not have to use my reproducer. You need to take a look at these lines:

starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X
starpu_data_invalidate_submit(x_handle); // Invalidate X
//starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X is ignored
starpu_task_insert(&use_codelet, STARPU_R, x_handle, 0); // Read X right after invalidation without an error
  1. Init handle X.
  2. Invalidate handle X. From this point it cannot be access in read mode.
  3. Use handle X in STARPU_R mode. It shall be in an uninitialized state at this point, so accessing it as STARPU_R must be prohibited.
Muxas commented 5 months ago

order of tasks for handle X:

  1. set_codelet: set X to zero. This is done in STARPU_W mode, so StarPU does memory allocation.
  2. invalidate_codelet: invalidates handle X and frees previously allocated memory.
  3. use_codelet: reads unallocated X. Memory is allocated by StarPU and it might be filled with some random data. StarPU gives this random data to my use_codelet to update value. You can add printf function to the use_codelet and it will print random uninitialized data.
sthibaul commented 5 months ago

The fix did not solve the problem.

The behavior is unchanged completely?

I believe you did not catch what I was trying to convey. You do not have to use my reproducer. You need to take a look at these lines:

starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X
starpu_data_invalidate_submit(x_handle); // Invalidate X
//starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X is ignored
starpu_task_insert(&use_codelet, STARPU_R, x_handle, 0); // Read X right after invalidation without an error
1. Init handle X.

2. Invalidate handle X. From this point it cannot be access in read mode.

3. Use handle X in STARPU_R mode. It shall be in an uninitialized state at this point, so accessing it as STARPU_R must be prohibited.

Prohibited? Why? The starpu_data_set_reduction_methods call provides the initializer, so starpu just needs to know that the data is now uninitialized, and thus has to call the init_cl again. This is what the addition of replicate->initialized = 0 is supposed to bring, so that _starpu_fetch_task_input_tail sets needs_init to 1 and thus _starpu_init_data_replicate gets called.

sthibaul commented 5 months ago

With my fix, I see clear_func getting called, while before it wasn't getting called and thus the value was indeed undefined in use_func

Muxas commented 5 months ago

Method starpu_data_set_reduction_methods sets reduction methods. But there is no reduction in the provided example. I have added starpu_invalidate_submit in a wrong place accidentally, but instead of indicating that I did mistake StarPU clear the buffer. I did not ask to clear it! This will lead to many errors due to implicit clearing. Other StarPU user might put starpu_invalidate_submit in a wrong place and it will not raise any error. This kind of implicit behavior is very hard to debug!

Muxas commented 5 months ago

Just to conclude on my side:

  1. I fully understand your opinion -- providing a buffer with initializer means scheduler can insert an initialization task if needed. Point.
  2. My personal opinion: initializers must be used implicitly only during reduction for tasks with STARPU_REDUX or STARPU_MPI_REDUX access modes.

I understand, that if it is done like I see it, then it might break STARPU_MPI_REDUX mode. Because locally on a single node STARPU_MPI_REDUX will be translated into STARPU_RW|STARPU_COMMUTE access mode, which shall not support implicit initializer judging by my opinion.

My proposition: add a new access mode STARPU_ALLOW_IMPLICIT_INIT. A user can use this flag to explicitly tell StarPU that it is OK to use initializer implicitly in case of accessing uninitialized data for specific tasks, only where this access mode appears. With this approach, user is warned about possible unintended initializations, at least. And all the STARPU_REDUX and STARPU_MPI_REDUX will include the flag automatically.

I am just wondering if my proposition makes sense to you. I described the way how I would implement it myself.