radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Integration test for the profiler #2302

Closed mturilli closed 8 months ago

mturilli commented 3 years ago

We need to test that the profiler produced the expected trace by checking that:

We should initially focus on the minimal set of timestamps that are always created by any configuration. We should also run this test on localhost and our primary target resources.

andre-merzky commented 3 years ago

Can I suggest to move this issue to RA? It's easiest to confirm the above by using RA, but I am a bit hesitant to introduce RA as a test dependency to RP. @iparask , what do you think?

mturilli commented 3 years ago

I think the test is about RP so it should live with the RP code base. Why is RA dependency an issue? I would consider it as any other library dependency?

andre-merzky commented 3 years ago

It's not much of an issue, more a general point trying to keep the number of dependencies small. I certainly can live with either approach.

iparask commented 3 years ago

We need to test that the profiler produced the expected trace by checking that:

I am a bit confused. If we want to test the profiler, we do not need an integration test. We need to create unit tests to ensure that the profiler does what it is supposed to do.

  • All the expected timestamps are created
  • The expected number of timestamps are created

I agree we need a set of tests with prof enabled to make sure that RP makes the expected prof calls. I am not sure an integration test is necessary.

We should initially focus on the minimal set of timestamps that are always created by any configuration.

Do you have a set of profiles in mind?

We should also run this test on localhost and our primary target resources.

What is the difference between localhost and other resources? We care about recording timestamps, right?

@andre-merzky: Can I suggest to move this issue to RA? It's easiest to confirm the above by using RA, but I am a bit hesitant to introduce RA as a test dependency to RP. @iparask , what do you think? @mturulIi: I think the test is about RP so it should live with the RP code base. Why is RA dependency an issue? I would consider it as any other library dependency?

I agree with @mturilli that this is an RP test and not an RA test. I am not sure how RA is needed here, and I agree with @andre-merzky to keep the dependencies as low as possible.

andre-merzky commented 3 years ago

@iparask : to clarify the motivation: we want to ensure that an RP session has a complete and RQA-usable set of timestamps. We observe again and again that sessions are incomplete, even if we can't pin any problems to the RU profiler or to any specific RP component. The goal of the test would be to make sure that those problems are caught early (outside of production and paper experiments). We suggest an integration test as the profiles seem to be incomplete in that complex environment of a run, not in isolation.

iparask commented 3 years ago

Thank you! Now it is a bit more clear. At which scale do you observe incomplete profiles?

andre-merzky commented 3 years ago

I don't know really, and I don't know if it's env or resource dependent either. A test to run under GH actions can only establish a baseline, but we have to start somewhere... :-/

You both seem to agree the test should live here, so that's fine with me. But I would indeed suggest to use RA to count the events, it has the right methods and tools (see here). I can take a first stab at this.

andre-merzky commented 3 years ago

@mturilli : move to profiler discussion

mturilli commented 3 years ago

Done: Discussed in https://github.com/radical-cybertools/radical.pilot/discussions/2408

andre-merzky commented 8 months ago

We now test RP sessions with RA during CI.