Closed LonelyCat124 closed 3 years ago
Its possible that this also just happens generally even without IO. Difficult to tell as the performance over a long run shouldn't be linear since stuff (particle interactions) eventually happens at some point.
Looks like the non-IO run is just the timestep reducing.
Fixed the bugs, so its just an IO issue. Looking at the profile was not clear to me what could be causing the issue, the number of tasks are so high its hard to see, though its very clear the time between copy_task
increases.
Ok - so this seems specific to the HDF5 implementation we have. Notably, just some simple output:
var filename = [rawstring](regentlib.c.malloc(1024))
format.snprint(filename, 1024, "file{}.txt", step);
var file = c.fopen(filename, "w+");
for part in [neighbour_init.padded_particle_array] do
if [neighbour_init.padded_particle_array][part].neighbour_part_space._valid then
format.fprintln(file, "{} {} {} {} ", [neighbour_init.padded_particle_array][part].core_part_space.pos_x,
[neighbour_init.padded_particle_array][part].core_part_space.pos_y,
[neighbour_init.padded_particle_array][part].core_part_space.pos_z,
[neighbour_init.padded_particle_array][part].rho)
end
end
c.fclose(file);
regentlib.c.free(filename)
Has little to no cost associated with it (as I'd expect)
My guess its to do with creating/deleting the regions used for the HDF5 transfers.
My previous guess appears to be wrong, however this region creation is removed as of f3cef89c114d0546fae88b2ece6ec00f606729df
Ok, so the problem is not running the attach/detach code in an inner
task. If I create a standalone inner task and adjust the code to be run in that then it works fine. The difficulty is that at the moment I'm not sure how to generate tasks which have demands etc.
Fixed, PR coming imminently
Using the IO module seems to cause a significant loss of performance vs performing no IO on single core. Notably, the longer the duration of the run, the more the apparent the degredation.
Without file output:
Each 0.001s sim time takes ~4s real time.
With file output:
Of course there's overhead for creating and writing to the file, but this looks like some sort of leak as opposed to anything else.