`MeasurementOutcomes` samples return type

alecandido commented 3 months ago

Yes, it is due to torch. Without the cast, the MeasurementOutcumes.samples breaks for torch because it returns a list of np.ndarrays and that breaks further torch functions. torch needs it to be a list of torch.Tensors. I do believe that @stavros11's patch is the easiest, most painless way to solve this.

Originally posted by @renatomello in https://github.com/qiboteam/qibo/issues/1414#issuecomment-2275006312

The fewer casts the better. So, a cast to NumPy from another framework should happen as late as possible, and explicitly by the user (not internally within the backend/Qibo)

renatomello commented 3 months ago

The fewer casts the better. So, a cast to NumPy from another framework should happen as late as possible, and explicitly by the user (not internally within the backend/Qibo)

Not to numpy, but out of numpy.

alecandido commented 3 months ago

Not to numpy, but out of numpy.

Uhm, I imagine samples to be computed by the backend. If the backend is a GPU one, they should be computed on the GPU, and the result should be a GPU array. In order to become NumPy, it had to be downloaded and cast. Then, if you want to reuse it on the GPU, you have to upload it back.

Replace GPU with PyTorch, and it should be the same.

marekgluza commented 3 months ago

Sorry if this is an off-topic but is there a standard way to dumpy shot data to files?

If no, then one could imagine a standard from qibo that each backend needs to implement and then there will be a standard way of loading and storing in files. Backends could be passing to each other in the same way and the samples method could be the user-facing data array so e.g. np.array while __samples could be the backend-specific type and then samples would be casting to numpy?

alecandido commented 3 months ago

Sorry if this is an off-topic but is there a standard way to dumpy shot data to files?

np.save("shots.npy", backend.to_numpy(shots))

alecandido commented 3 months ago

If no, then one could imagine a standard from qibo that each backend needs to implement and then there will be a standard way of loading and storing in files. Backends could be passing to each other in the same way and the samples method could be the user-facing data array so e.g. np.array while __samples could be the backend-specific type and then samples would be casting to numpy?

In any case, backends should not communicate to each other through disk, but through memory. And the common lingo among all backends is NumPy. If you want to extract shots from one backend, and consume them somehow in another one, you first download with .to_numpy(), and then cast in the other backend with .cast().

Passing through files (which most likely are on disk, but in any case require a file system to process them) would be much more expensive.

marekgluza commented 3 months ago

Thank you Ale, to_numpy answers my question. (I didn't suggest to interface via disk, rather there could be an internal method and then samples before returning could cast to numpy as you showed. Based on what you said - why not have samples return always the ndarray in the shared lingo?)

On Thu, 8 Aug 2024, 17:33 Alessandro Candido, @.***> wrote:

If no, then one could imagine a standard from qibo that each backend needs to implement and then there will be a standard way of loading and storing in files. Backends could be passing to each other in the same way and the samples method could be the user-facing data array so e.g. np.array while __samples could be the backend-specific type and then samples would be casting to numpy?

In any case, backends should not communicate to each other through disk, but through memory. And the common lingo among all backends is NumPy. If you want to extract shots from one backend, and consume them somehow in another one, you first download with .to_numpy(), and then cast in the other backend with .cast().

Passing through files (which most likely are on disk, but in any case require a file system to process them) would be much more expensive.

— Reply to this email directly, view it on GitHub https://github.com/qiboteam/qibo/issues/1416#issuecomment-2276119799, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU6DWHTTT7WVJJSK5SD5ADZQOFVNAVCNFSM6AAAAABMGG4INOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGEYTSNZZHE . You are receiving this because you commented.Message ID: @.***>

alecandido commented 3 months ago

I didn't suggest to interface via disk, rather there could be an internal method and then samples before returning could cast to numpy as you showed.

Sorry, I didn't fully understand, and I wanted to make sure of that point 😅

Based on what you said - why not have samples return always the ndarray in the shared lingo?

I see two reasons:

you may want to keep manipulating your results, and better done in the same framework you used for execution - e.g., if you computed a giant array on GPU, but you're only interested in certain sums of the shots, better complete the operation on the GPU, and only download the part you're interested in
just because you're not losing anything: if you want the NumPy array, you can always get it with .to_numpy(), so you already have a way to get it - if instead you immediately returned NumPy, but you would have liked the original object, you should convert it back; i.e. breaking the operation in two steps is more modular and composable (since internally you have to go through the two steps anyhow)

qiboteam / qibo

`MeasurementOutcomes` samples return type #1416