Add open batches to pulser-pasqal

oliver-gordon commented 5 months ago

On the PCS we support open batches, to continuously add new jobs and there is support for this in the SDK but this has not been translated over to pulser-pasqal yet.

Due to the abstraction of batches in this lib I've attempted to keep away the term batch and just flow with open_submission, since our known reference is a submission_id.

When creating a new submission, a user is to either pass the submission_id and we can append new jobs to an existing batch or a boolean flag to indicate whether or not we want the submission being created to be left as open

MatthieuMoreau0 commented 5 months ago

Due to the abstraction of batches in this lib I've attempted to keep away the term batch and just flow with open_submission, since our known reference is a submission_id.

Should we not introduce the notion of batch at all in this lib? If it becomes the main entrypoint to submit jobs, shouldn't we use the same language as on other components of our platform - where the notion of batch is central. I am worried it would create some confusion for users when switching from pulser-pasqal to the user portal. Curious what you guys think on that front

MatthieuMoreau0 commented 5 months ago

My main suggestion for the user interface compared to what is done here is to not ask the user for the submission_id explicitly but handle it for them; we only need the user to let us know when they want to open the batch and when they want to close it.

We could provide a open_batch/start_session method which would create an open batch and store as an attribute of the class the batch ID. In the run function flow we would then check for that attribute to know whether we should create a new closed batch or add jobs to the current batch. We could add a close_batch/stop_session which makes an API call to close the batch and reset the batch ID stored in the backend class.

Example snippet

Open batch

backend = QPUBackend(seq, connection)
backend.start_session()
# these jobs are automatically added to the batch
results = backend.run(job_params, wait=True)
# some more jobs added to the same batch
results_2 = backend.run(job_params_2, wait=True)
backend.stop_session()

Closed batch (no change)

backend = QPUBackend(seq, connection)
# Create a closed batch with some jobs
results = backend.run(job_params, wait=True)
# Create another batch with some jobs
results_2 = session.run(job_params_2, wait=True)

We could also provide a context manager to handle the batch lifecycle

Starting the context manager would create a batch and store the associated batch_id as an attribute of the backend instance. Exiting the context manager closes the batch automatically and resets the batch_id attribute.

If the run method is used outside the context manager, it creates a closed batch with the jobs passed as argument.

e.g the user interface would look something like

Open batch example:

backend = QPUBackend(seq, connection)
# opening the context manager creates a batch. 
# Here I used the wording "start session" but it could be called "open_batch"
with backend.start_session():
     # these jobs are automatically added to the batch
     results = backend.run(job_params, wait=True)
     # some more jobs added to the same batch
     results_2 = backend.run(job_params_2, wait=True)
#we exited the context manager so the batch is now closed

Closed batch (no change)

backend = QPUBackend(seq, connection)
# Create a closed batch with some jobs
results = backend.run(job_params, wait=True)
# Create another batch with some jobs
results_2 = session.run(job_params_2, wait=True)

Thoughts about this general interface? What about the context manager? It's simply syntactic sugar but it can help prevent users forgetting to close their batch

oliver-gordon commented 5 months ago

@MatthieuMoreau0 Should we not introduce the notion of batch at all in this lib? That would not be for this MR, where the intention is to introduce functionality. To introduce the concept of a batch would be to modify all the existing interfaces

HGSilveri commented 5 months ago

Due to the abstraction of batches in this lib I've attempted to keep away the term batch and just flow with open_submission, since our known reference is a submission_id.

Should we not introduce the notion of batch at all in this lib? If it becomes the main entrypoint to submit jobs, shouldn't we use the same language as on other components of our platform - where the notion of batch is central. I am worried it would create some confusion for users when switching from pulser-pasqal to the user portal. Curious what you guys think on that front

Indeed, I have avoided referring to "batches" so far, but in reality "submission" is still only an internal parameter too, so we could start calling it "batch" from the get go. However, I'm not sure about this because it seems to me that the way the user will interact with the Batch is different - they won't be taking the batch, adding jobs to it and closing, but rather using the backend to do all these things. So on the one hand, calling a submission a "batch" will provide a useful link to a concept of pasqal-cloud, but on the other hand it might induce confusion.

HGSilveri commented 5 months ago

Open batch example:

backend = QPUBackend(seq, connection)
# opening the context manager creates a batch. 
# Here I used the wording "start session" but it could be called "open_batch"
with backend.start_session():
     # these jobs are automatically added to the batch
     results = backend.run(job_params, wait=True)
     # some more jobs added to the same batch
     results_2 = backend.run(job_params_2, wait=True)
#we exited the context manager so the batch is now closed

Closed batch (no change)

backend = QPUBackend(seq, connection)
# Create a closed batch with some jobs
results = backend.run(job_params, wait=True)
# Create another batch with some jobs
results_2 = session.run(job_params_2, wait=True)

Thoughts about this general interface? What about the context manager? It's simply syntactic sugar but it can help prevent users forgetting to close their batch

I quite like this, I find it elegant, relatively simple and it meets the criteria of having the actions performed by the backend. One question though: when you call run() for the second time in an open batch, does the results_2 object give a view of only the results of the new jobs, or also include the results of previously submitted jobs? In other others, does it complement results or does it replace it? I suspect you were going for the first option and it is also the one that makes most sense to me

awennersteen commented 5 months ago

I personally would welcome having the concept of Batch in Pulser. However it would be in a way that I can re-use in downstreams libraries, and ideally we'd have it consistent with the cloud backend and emulation backends (even if that means usually wrapping the results of the emulator as a single job in a batch).

Due to that, I think I Agree with Oliver that it might be out of scope here.

MatthieuMoreau0 commented 5 months ago

I was not suggesting to introduce batches in this MR specifically either; and definitely not as an object the user should manipulate. I was wondering mostly about the wording because as Henrique mentioned submission_id is currently an internal implementation detail the user is not aware of so we could rename it "batch" freely if we want to share the cloud naming conventions

MatthieuMoreau0 commented 5 months ago

when you call run() for the second time in an open batch, does the results_2 object give a view of only the results of the new jobs, or also include the results of previously submitted jobs?

I don't know tbh. Whatever makes most sense given the current implementation of backends in Pulser

awennersteen commented 5 months ago

In Henriques example with open batches what if we do:

backend = QPUBackend(seq, connection)
# opening the context manager creates a batch. 
with batch as backend.start_session():
     # we can (optionally?) pass the context to run
     results = backend.run(job_params, wait=True, batch=batch)
     # some more jobs added to the same batch, and the results
     # are just the results from that single run
     results_2 = backend.run(job_params_2, wait=True, batch)

With the actual batch implementation could come in a followup, so that we can think of that interface a bit/not scope creep this mr

HGSilveri commented 5 months ago

In Henriques example with open batches what if we do:

*Matthieu's example, I don't want to take away his credit :slightly_smiling_face:

backend = QPUBackend(seq, connection)
# opening the context manager creates a batch. 
with batch as backend.start_session():
     # we can (optionally?) pass the context to run
     results = backend.run(job_params, wait=True, batch=batch)
     # some more jobs added to the same batch, and the results
     # are just the results from that single run
     results_2 = backend.run(job_params_2, wait=True, batch)

With the actual batch implementation could come in a followup, so that we can think of that interface a bit/not scope creep this mr

I'm not convinced with exposing a batch object here, I feel like it strays away from the backend interface philosophy and it defeats the purpose of trying to simplify things for the more "junior" Pulser users. Let's remember that pasqal-cloud is not going away, so the option to work with batches will remain available for whoever needs them.

For the purposes of this PR (ie just allowing an open batch/submission), I feel like we can get away with the backend storing internally the open batch ID and just adding jobs to it on every run() call that happens within the context (instead of starting a new batch).

oliver-gordon commented 5 months ago

I suppose there are some assumptions that the runtime doesn't end for us to retain an ID as a class property? What if a user is running incredibly short lived scripts? or executing something again to add another job after already creating and submitting a batch? We would be missing the submission_id property

I can't speak for the regular usecases of pulser-pasqal users but, this one seems rather plausible?

HGSilveri commented 5 months ago

I suppose there are some assumptions that the runtime doesn't end for us to retain an ID as a class property? What if a user is running incredibly short lived scripts? or executing something again to add another job after already creating and submitting a batch? We would be missing the submission_id property

I can't speak for the regular usecases of pulser-pasqal users but, this one seems rather plausible?

I would go back to the argument that using pasqal-cloud directly is still an option. This implementation provides a restricted access to the feature and that's okay. With pulser-pasqal, if a user wants to have use an open batch they have to stay within the context and the moment the exit it, the batch is closed. In exchange for forcing the user to work within these bounds, I think this approach provides a simpler interface (eg not having to deal with batch IDs) and it is foolproof (eg. the user doesn't have to know/remember to close the batch). In my opinion, this is a fair tradeoff.

oliver-gordon commented 5 months ago

I can live with this. I'll push my change to the draft today to see if we're aligned on how it will look

MatthieuMoreau0 commented 5 months ago

I suppose there are some assumptions that the runtime doesn't end for us to retain an ID as a class property? What if a user is running incredibly short lived scripts? or executing something again to add another job after already creating and submitting a batch? We would be missing the submission_id property

To cover this the start_session function could accept an optional argument batch_id. If set, it would not open a new batch but simply store that batch_id as an attribute to be reused when calling the run function to add other jobs to that existing batch

I think we can start without that optional argument in this MR and wait for user feedback.

HGSilveri commented 2 months ago

Still needed:

[x] Incorporate changes in #707 once that is merged
[x] Test everything is working with an actual connection

HGSilveri commented 2 months ago

@oliver-gordon @MatthieuMoreau0 I managed to replace Submission -> Batch in a backwards compatible way everywhere except in RemoteResults (basically, the notion of "submission" is contained in the RemoteResults class). There are still links between submission and batch in RemoteResults though (eg the RemoteResults.batch_id property). I had to replicate SubmissionStatus as BatchStatus and keep both because there are users that might have scripts where they call RemoteResults.get_status() and they expect a SubmissionStatus instance as the return type. Is this acceptable for you?

review-notebook-app[bot] commented 2 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

pasqal-io / Pulser

Add open batches to pulser-pasqal #701