A start on performance testing, and the generation of process tensors for that purpose.

gefux commented 3 months ago

resolves #119 and #120.

Implements:

tests/data/generate_pts.py: a module that generates process tensors for performance testing purposes based on a name convention similar to the one discussed in #119.

Performance analysis for the PT-TEBD functionality (serving as a blueprint for other use cases). This entails:

tests/performance/pt_tebd.py: a module that defines a set of functions and parameters to be used for the performance analysis. Note here the variables ALL_TESTS and REQUIRED_PTS.
tests/performance/analysis/pt_tebd_prepare.py: a skript which generates the required process tensors (given by REQUIRED_PTS) in case they have not been generated previousely.
tests/performance/analysis/pt_tebd_run.py: a skript that runs all PT-TEBD performance tests (given by ALL_TESTS) and stores the result.
tests/performance/analysis/pt_tebd_plots.py: a skript that produces plots from the stored results.

piperfw commented 3 months ago

@gefux The framework looks really good. I have a few questions and suggestions on particulars:

How are the tests to be invoked exactly? I was able to run e.g. python tests/performance/analysis/pt_tebd_run.py, but the paths don't quite seem right for a tox call
The use of file paths seems quite fragile. For example, pt_tebd_run.py failed for me because directory tests/data/performance_results/ did not exist. I would try to always check the target directory exists before running anything. For example, run_all could take an optional out_f argument which, if specified, checks the directory exists before running tests, and writes the results to a file afterwards:
```
if not os.path.isdir(os.path.dirname(out_f)):
os.makedirs(out_f)
# tests
# write out using dill
```
on a similar note, REQUIRED_PTS is constructed by hand, and there is no check it actually contains exactly those names in the parameters. I've put in a construction using a set union which should be more robust. Feel free to reverse this change if you think it is unnecessary.
in the results dictionary for each test, I would suggest saving some metadata: timestamp (epoch), platform.python_version(), oqupy.__version__. This could be in a sub-dictionary results['X'] where 'X' is 'meta' /'version_info'/'platform' or similar
I added a check for no match in generate_pts.py to give a clearer error if the name specified is invalid, and a more pythonic way to check the cutoff name (the dictionary cutoff_map could be defined globally and used elsewhere). Feel free to reverse these.
it might be nice to add a progress indicator "test number X of TOT" in run_all
having pt_tebd_prepare.py and pt_tebd_run.py separate seems excessive. I would have just pt_tebd_run.py preparing and running the test unless there is a practical reason these should be kept separate.

The other thought I had on is whether we should code in some timeouts. I'm not familiar with GitHub CI but I understand free accounts get 2000 mins / month. If we're using their runners, we don't want a test to hang (it looks like GitHub has a job timeout of 6 hours). Here are two ways timeouts can be imposed in Python: https://medium.com/@chaoren/how-to-timeout-in-python-726002bf2291

piperfw commented 3 months ago

@JoelANB Note there were some changes to the test file arrangement. If you've already started on the mean-field test, I would probably suggest just getting that done and it shouldn't be too much work to reorganise things at the end as needed.

gefux commented 3 months ago

Thank you @piperfw for your comments and suggestions.

Invoking the tests: As these runs take a non-negligible amount of computation time, I'd suggest not to invoke the computations automatically.
Directory structure: Thanks for pointing this out. I've forgotten to add the .keep files in the respective directories (which is another way of circumventing this problem).
Automatically constructing REQUIRED_PTS: I've not done that because use cases might have no or multiple process tensors as arguments (for example: TEMPO and multiple environments). For this and similar usecases we can leave it as you suggest it. For the exceptional examples one can then either adjust it or construct it by hand case by case.
Storing meta-data: I agree. I would however postpone this to when we'd be actually using this meta data for something, as we'll then also know which meta data we are interested in.
Check no-match in generate_pts.py: Great. Thank you!
Progress indication in run_all: I've added such an indication which is on by default but can be silenced by setting verbose=False.
I've merged pt_tebd_prepare.py and pt_tebd_run.py.
Concerning timeouts: As this is really only "a start on performance testing", I'd suggest (as mentioned above) not to invoke the runs automatically, but only use this as a sort of library of performance relevant examples, that can be run on either a local machine or some cluster, without draining any of the free quota on travis and co.

piperfw commented 3 months ago

All the changes look great @gefux! The added verbosity is very helpful and the combination of prepare and run scripts a good simplification.

Thanks for responding to my comments in detail. Manually invoking testings for now (and so not requiring timeouts) makes sense.

The only point I would pick up with is the metadata: including even a basic amount now seems like a very low cost operation that can be built upon if/when we decide exactly what metadata will be useful.

It's surprising how quickly one forgets exactly which test generated which results file, and short of deleting all test data and running every test again, there is no reliable way to ascertain this (e.g. perhaps parameters were changed between tests).

Version metadata may be helpful if a performance change with time is noticed. It may also be useful when investigating performance issues reported by other users (e.g., the hd5py version problem we had a while ago).

The metadata could just be added by run_all, so separate to the implementation of any particular test. I've added a basic function to do this in run_all.py. If you really don't think this is helpful, then we can remove it, otherwise I at least don't see any harm in having the template there now.

Let me know and I can complete the PR!

piperfw commented 3 months ago

I've just finished tidying up Joel's performance test for mean-field systems to be in line with the current testing structure. I suggest we merge his fork into yours before completing this PR.

gefux commented 3 months ago

Hi Piper. The meta-data function is great - you have convinced me. On the PR²: I don't think a PR on a PR is a good idea. In this case it messes up the rebase that I've done. So, I think it would be better to first merge this PR to dev-v0.5. Then Joel's PR should be a seperate PR and ideally rebase on dev-v0.5 (as we advice in the CONTRIBUTION.md) I am happy to help with that rebase.

piperfw commented 3 months ago

Great Gerald, are you happy for me to squash the commits (I normally just group by authorship) here and merge? Would do it tomorrow AM.

Rebasing Joel's after should be fine but I'll let you know if I need help.

piperfw commented 3 months ago

Merged (closed #120, left #119 open for now). Attempting to rebase second-order PR...

tempoCollaboration / OQuPy

A start on performance testing, and the generation of process tensors for that purpose. #123