stfc / RegentParticleDSL

A particle-method DSL based on Regent programming language
MIT License
1 stars 0 forks source link

IO module (Simple HDF5) causes significant performance degradation #69

Closed LonelyCat124 closed 3 years ago

LonelyCat124 commented 3 years ago

Using the IO module seems to cause a significant loss of performance vs performing no IO on single core. Notably, the longer the duration of the run, the more the apparent the degredation.

Without file output:

Time = 0.001057, runtime = 4
Time = 0.002054, runtime = 8
Time = 0.003052, runtime = 13
Time = 0.004050, runtime = 17
Time = 0.005048, runtime = 21
Time = 0.006046, runtime = 25
Time = 0.007044, runtime = 29
Time = 0.008042, runtime = 34
Time = 0.009040, runtime = 38
Time = 0.010037, runtime = 42
Time = 0.011035, runtime = 46
Time = 0.012033, runtime = 50
Time = 0.013031, runtime = 54
Time = 0.014029, runtime = 59
Time = 0.015027, runtime = 64
Time = 0.016025, runtime = 70
Time = 0.017023, runtime = 77
Time = 0.018021, runtime = 81
Time = 0.019018, runtime = 86
Time = 0.020016, runtime = 91
Time = 0.021014, runtime = 97

Each 0.001s sim time takes ~4s real time.

With file output:

Time = 0.001057, runtime = 14
Time = 0.002054, runtime = 42
Time = 0.003052, runtime = 90
Time = 0.004050, runtime = 152
Time = 0.005048, runtime = 244
Time = 0.006046, runtime = 340
Time = 0.007044, runtime = 466

Of course there's overhead for creating and writing to the file, but this looks like some sort of leak as opposed to anything else.

LonelyCat124 commented 3 years ago

Its possible that this also just happens generally even without IO. Difficult to tell as the performance over a long run shouldn't be linear since stuff (particle interactions) eventually happens at some point.

LonelyCat124 commented 3 years ago

Looks like the non-IO run is just the timestep reducing.

LonelyCat124 commented 3 years ago

Fixed the bugs, so its just an IO issue. Looking at the profile was not clear to me what could be causing the issue, the number of tasks are so high its hard to see, though its very clear the time between copy_task increases.

LonelyCat124 commented 3 years ago

Ok - so this seems specific to the HDF5 implementation we have. Notably, just some simple output:

  var filename = [rawstring](regentlib.c.malloc(1024))
  format.snprint(filename, 1024, "file{}.txt", step);
  var file = c.fopen(filename, "w+");
  for part in [neighbour_init.padded_particle_array] do
    if [neighbour_init.padded_particle_array][part].neighbour_part_space._valid then
      format.fprintln(file, "{} {} {} {} ", [neighbour_init.padded_particle_array][part].core_part_space.pos_x,
                                               [neighbour_init.padded_particle_array][part].core_part_space.pos_y,
                                               [neighbour_init.padded_particle_array][part].core_part_space.pos_z,
                                               [neighbour_init.padded_particle_array][part].rho)
    end
  end
  c.fclose(file);
  regentlib.c.free(filename)

Has little to no cost associated with it (as I'd expect)

LonelyCat124 commented 3 years ago

My guess its to do with creating/deleting the regions used for the HDF5 transfers.

LonelyCat124 commented 3 years ago

My previous guess appears to be wrong, however this region creation is removed as of f3cef89c114d0546fae88b2ece6ec00f606729df

LonelyCat124 commented 3 years ago

Ok, so the problem is not running the attach/detach code in an inner task. If I create a standalone inner task and adjust the code to be run in that then it works fine. The difficulty is that at the moment I'm not sure how to generate tasks which have demands etc.

LonelyCat124 commented 3 years ago

Fixed, PR coming imminently