Experiment serialization

coruscating commented 9 months ago

Suggested feature

A frequently requested feature is the ability to reconstruct an entire experiment using syntax similar to ExperimentData, i.e. BaseExperiment.load(id) and BaseExperiment.save().

PRs

[ ] Save ExperimentConfig to artifacts, use from_config() to reconstruct the experiment container in the same environment
[ ] Add versioning data (QE + experiment class + Qiskit + service + random seeds used) to experiment metadata. This should allow QE to determine whether the current environment is the same as the one in the experiment and adjust accordingly
[ ] Add ability to run custom transpiled circuits (#1222 or new PR). This allows QE to run circuits saved in job data. Users have the ability to decide between running the transpiled circuits or generating new circuits.
[ ] Add experiment loading interface which loads circuits from job data and relevant metadata from artifacts

Open questions

Behavior when current environment is different from one in saved experiment
How to deal with backend/layout specific parameters
How to deal with calibration objects

ItamarGoldman commented 8 months ago

The user usage is to be able to work as following:

Get the experiment running (jobs submitted and all)
Save the experiment (this gives me an experiment id)
Terminate the python script
Start a new python session and load the experiment from the database service using the previously obtained experiment id
Run analysis

An option for interface would be to load experiment from the experiment ID:

exp = BaseExperiment.load(exp_id)

then for the user to have experiment data that is connected to the experiment object, he can use:

exp_data = ExperimentData.load(exp)

The reason to break the creation of the experiment and the experiment data is for the case where a user would want to create an experiment with the same configuration and analysis option but doesn't need the data. Example: The user want to run T1 experiment daily with the same configuration. In this case the data of the previous experiment isn't relevant and it will ease on the user not to configure an experiment twice.

Alternatively, the user can load experiment data without experiment object using experiment ID:

exp_data = ExperimentData.load(exp_id)

For the expected usage of loading both experiment data and the experiment object, we could make use of a utility function:

exp, exp_data = load_experiment(exp_id)

This will require that there will be a place to store experiment ID in the BaseExperiment class.

coruscating commented 8 months ago

Thanks for the interface patterns @ItamarGoldman. I agree it's good to have the option to load either the experiment or experiment data or both, since these operations could be slow for large experiments. Since there's precedent for BaseExperiment methods returning ExperimentData objects, we can also consider something like exp, exp_data = BaseExperiment.load(exp_id, return_exp_data=True) as the pattern to load both objects instead of introducing another utility function.

As for ExperimentData.load(exp), I think this overloading makes the already confusing interface even more confusing, and the experiment object would have to provide not only experiment ID but also the service for this to work. I prefer to keep the ExperimentData.load(exp_id) pattern and improve the current service and provider parameters so that we're not passing in provider=QiskitRuntimeService(). What do you think?

eliarbel commented 8 months ago

As for ExperimentData.load(exp), I think this overloading makes the already confusing interface even more confusing

Good point, I tend to agree with keeping the function taking exp_id only

ItamarGoldman commented 8 months ago

After some tests I have some insights. ExperimentDecoder knows how to import the class from the module, so we can change the method BaseExperiment,from_config(cls, cofig) and BaseAnalysis,from_config(cls, cofig) to initialize the class using:

ret = config.cls(*config.args, **config.kwargs)

In addition, I think we should match the load method in ExperimentData class. So the method should have the following signature:

@classmethod
def load(
        cls,
        experiment_id: str,
        service: Optional[IBMExperimentService] = None,
        provider: Optional[Provider] = None,
    ) -> "BaseExperiment":
    # Add validity check here
    if service is None:
        if provider is None:
            raise ExperimentDataError(
                "Loading an experiment requires a valid Qiskit provider or experiment service."
            )
        service = cls.get_service_from_provider(provider)
    # getting experiment config and analysis config from db
    experiment_config, analysis_config = service.load_experiment_config(exp_id)
    # reconstructing the experiment (here we can support custom experiment)
    reconstructed_experiment = cls.from_config(experiment_config)
    # creating analysis class (here we can support custom experiment)
    reconstructed_experiment.analysis = reconstructed_experiment.analysis.from_config(analysis_config)
    # returning experiment obj
    return reconstructed_experiment

To load experiment and experiment data at the same time we can overload the function:

@classmethod
def load(
        cls,
        experiment_id: str,
        service: Optional[IBMExperimentService] = None,
        provider: Optional[Provider] = None,
        return_exp_data,
    ) -> "BaseExperiment":

    # Using previous implementation to reconstruct experiment
    reconstructed_experiment = cls.load(exp_id, service, costume_experiment_class)
    # getting experiment data
    reconstructed_experiment_data = ExperimentData.load(exp_id, service)
    return reconstructed_experiment, reconstructed_experiment_data

Another thing I thought of is to load experiment with custom experiment and analysis. In this case the user will provide us the classes and we will use them to load experiment_config and analysis_config. This will be easy done by passing the class by the user.

What do you think?

wshanks commented 8 months ago

Some points from external discussion:

How should the backend be handled? One option is to ignore it and require the user to use reconstructed_experiment.run(backend=backend) with backend reconstructed separately. Since the load method in the above examples takes the provider and the experiment service stores the backend name, the method could call provider.get_backend(backend_name) and set reconstructed_experiment.backend for the user. For the use case of running on the same backend, this would be convenient for the user. On the other hand, some users may want to run an experiment on a different backend so backend handling may need to be optional. (Personally I have some worry about changing the backend -- we can strive to make that a supported pattern but some experiments currently might set options when the backend is set and I am not sure if they are robust to changing the backend).
There was a suggestion to have load methods for the experiment and experiment data and a separate helper function for loading both at once rather than building in the ability to load both from the class method of one of them.
There was a question about drawer and plotter options which are not currently saved with the analysis. The consensus was to support saving and loading these options but perhaps not in the first pass. Personally, I am a bit confused about why a serialized experiment does not save the serialized analysis and why the serialized analysis does not save the plotter and drawer. Maybe different use cases need to be supported with serialization, but one of them should be "save and restore everything."

ItamarGoldman commented 8 months ago

Another idea that was mentioned is that the load function will load the transpiled circuits of the experiment. for it, we will need to take the following thing into consideration:

We will need a place to store the transpiled circuits.
downloading the data from the DB could take the same time as transpiling the circuit again (I think @TsafrirA said he saw this behavior?)

wshanks commented 8 months ago

Another idea that was mentioned is that the load function will load the transpiled circuits of the experiment.

This idea is in @coruscating's original list above:

Add ability to run custom transpiled circuits (https://github.com/Qiskit-Extensions/qiskit-experiments/pull/1222 or new PR). This allows QE to run circuits saved in job data. Users have the ability to decide between running the transpiled circuits or generating new circuits.

In the most recent discussion we said this support is not required in the initial implementation. I think this use case should be supported, but I don't think it should be the default behavior. I see it as a more advanced use case because there is more that can go wrong. The old circuits might not be valid after an update to the backend or when trying to run on a different backend or if the experiment class has been updated since the previous run. Also, downloading the circuits could be slower than regenerating them, like you mention.

coruscating commented 7 months ago

@wshanks discussed in the meeting today that we should support the use case of not running the analysis when running an experiment and then saving experiment config in a dummy ExperimentData container.

wshanks commented 7 months ago

In the meeting, I was thinking that experiment.analysis = False was the way to avoid running analysis immediately on an experiment. I forgot about experiment.run(analysis=False). So I am not sure if there is much to do for that last point (#1423 might already support it), but we should make sure that it does.

qiskit-community / qiskit-experiments

Experiment serialization #1392

Suggested feature

PRs

Open questions