Fix inconsistent handling of figure names

wshanks commented 5 months ago

ExperimentData._clear_results() cleared ExperimentData._figures but not the related ExperimentData._db_data.figure_names exposed as ExperimentData.figure_names. BaseAnalysis.run() calls ExperimentData._clear_results() so when a second analysis run produced a different set of figure names from the first the ExperimentData instance would be left a figure_names property with entries that did not match the underlying figures data. Here, this issue is addressed by also clearing ExperimentData._db_data.figure_names.

This issue was first noticed due to occasional failures of the test.database_service.test_db_experiment_data.TestDbExperimentData.test_copy_figure_artifacts test which runs a fake experiment and then adds a figure and artifact to the experiment data. Then the tests copies the experiment data to test the behavior of copy(). Occasionally, the adding of the figure occurs before the background analysis thread runs. In this case, the figure name gets added to figure_names before the results get cleared by the analysis callback. Because of the bug described above, the old figure name stays around in the original experiment data object but does not copy to the new object because figure names are copied using the data in ExperimentData._figures and not ExperimentData._db_data.figure_names. A block_for_results() was added to the experiment because in principle the copy() could occur before the analysis clears results and then the two data objects would not match, though this case has not been observed (only adding the figure and then clearing it with analysis before copying so the copy is the one missing the figure name has been observed; not the copy having the name and not the original).

Addtionally, BaseAnalysis.run() handled the figure_names option incorrectly when passes a keyword argument. Keyword arguments to passed to run() cause the analysis class to copy itself and add the options to the copy before running it. However, within the run() method, self.options.figure_names was used for assigning figure names rather than referencing the copy of the analysis class, so figure names could not be set using figure_names as a keyword argument. This issues was fixed by replacing self references with references to the copy.

A regression test was added that fails for either of the above issues being present (figure names not matching after re-running analysis; figure names not settable from a run() keyword argument).

wshanks commented 5 months ago

Hmm, I am not sure I would go all the way to block_for_results with copy(). I am not sure what the intended uses are for copy() but I think one is that you want to keep the "input" metadata and run new analysis, so you don't need to wait for the previous analysis? Also, blocking in copy() would not help with the test failure we had. That issue was that running analysis starts by clearing the results of an experiment data object and it was indeterminate whether that clearing happened before or after the extra add_figures() was called. I don't understand why a user couldn't run multiple analysis classes on an experiment data and accumulate the results. In the git history, I couldn't find what issue clearing the results was avoiding versus just letting the results get overwritten if they already existed.

coruscating commented 5 months ago

That's fair. Specifically for the copy_results=True case, it calls _wait_for_futures() instead of block_for_results() which checks recursively that the futures are all finished, but that doesn't seem to cause any issues. Keeping multiple sets of results does sound potentially useful, so we should discuss changing the behavior for BaseAnalysis.run() or adding it as an option.

qiskit-community / qiskit-experiments

Fix inconsistent handling of figure names #1430