ymd-h / pennylane-cutensornet

(unofficial) PennyLane plugin for cuTensorNet of NVIDIA cuQuantum

MIT License

0 stars 0 forks source link

Performance Issue #1

Open ymd-h opened 1 year ago

ymd-h commented 1 year ago

Observed Issue

With relatively small circuit, I observed it took about 10 times longer than "default.qubit".

According to %prun profiling on Google Colab, the bottleneck is cuquantum.cutensornet.cutensornet.contraction_optimize.

Assumption

The main target of cuTensorNet is large circuit, so that for relatively small circuit its overhead is probably more significant than its speed-up.

Future Work

In default implementation batch_execute() calls execute() one by one serially. If we could pass batch circuits to cuTensorNet at once, it might be run parallelly on GPU.

leofang commented 1 year ago

Yes, it's expected because path finding overhead can be large. It's best that you compute/cache the contraction path if you know in advance that a certain tensor network topology would be reused in your simulation (it can be nontrivial depending on your workload).

ymd-h commented 1 year ago

Thank you @leofang

I'm a still beginner, so that your experienced comment is pretty helpful. I will try to investigate further.

What I am interested in is variational quantum algorithms (VQA) using parameterized quantum circuit (PQC), especially QCL[1]. (Ref: Works at my another repo.)

In the setting, parameters of the circuit are changed at every execution.

Do you think the cache strategy still works?

[1] https://arxiv.org/abs/1803.00745

leofang commented 1 year ago

I'd say so. For parametrized circuits (like this one that you had, with fixed arguments), the circuit topology is fixed. Then the same path can be reused even though you'd change the tensors inside the circuit.

haidarazzam commented 1 year ago

if you provide more details we could try to give more accurate answer.

cutensornet include a pathfinder (to find an optimal path) that is configurable (meaning its time vary based on the configuration used)
execution which contract the network on GPU

depending on the number of tensors and the size of each tensor performance can be defined. if your tensors are not tiny then the overhead is very minimal even for small network/circuit

what would be beneficial is if you give use an idea about the size of your circuit ( # tensors) and the approximate size (extent) of the tensors, how many of the same circuit you would like to contract, what is the average contraction time of your example as well as which cutensornet version.

ymd-h commented 1 year ago

@haidarazzam Thank you.

Tensor size is not clear yet. I want to find "good" circuit and my work is still in early stage. (Smaller circuit is more preferable for computation as long as it achieve enough result.)