nv-legate / cupynumeric

An Aspiring Drop-In Replacement for NumPy at Scale
https://docs.nvidia.com/cupynumeric
Apache License 2.0
623 stars 71 forks source link

Don't make copies of input Stores fully overlapping outputs #1112

Open manopapad opened 10 months ago

manopapad commented 10 months ago

Currently, if there is (full or partial) overlap between an input and an output Store in an operation, we handle this at the cuNumeric level, since the core will not do this check for us:

if store1.overlaps(store2):
  store1 = store1.copy()
task.add_input(store1)
task.add_output(store2)

However, if the following are all true:

then we can avoid making the copy, because we can expect the core to coalesce the read requirement for store1 and the write requirement for store2 into the same read-write requirement.

We must be certain that the core will apply the coalescing transformation, otherwise we will get tasks with conflicting region requirements, which Legion will not catch in release mode.