Closed cnktysz closed 1 year ago
With the batching system: 1500 circuits → 400+400+400+300 = 7 jobs
7? or 4? I guess I do not fully follow your example
The way things were done was the algorithm was, in general, independent of any backend with the quantum instance responsible for any breakdown of the circuits into jobs based on backend limitations.
My recollection is that the problem in this instance is that with larger kernels the sheer amount of circuits it could generate would cause out-of-memory errors. Hence a local limit (batch size) was chosen to avoid this, but there is this interplay that did get introduced between the chunks generated by the kernel and the breakdown that is needed by the backend so that there may then be inefficiencies if the user simply stays with defaults. The default value was some other value in the past - 1000 - but was dropped to 900 otherwise it was indeed inefficient for limits as known when it was done since for each 1000 batch it would do 2 jobs, one at 900 and one at 100!
It is 7:(300+100)*3+300. IBMQ backend creates jobs that can fit to max_circuits, that is why 2 jobs are created when you pass 400.
I think I didn't choose the best numbers as the example, but it is possible to get more jobs with certain settings. Unwanted jobs would occur when max_circuits is not a divisor of the batch_size value.
We (me and @adekusar-drl ) thought turning it off would be easiest fix.
Ok it did not state that 400 was the batch size. 1500 circuits with 400 batch size and 300 max circuits on backend would indeed create an inefficiency.
As to turning it off by default, as the easiest fix, I think you need to investigate behavior with larger kernels and ensure we don;t run out of memory. Maybe the default could be informed via the quantum instance - though of course for local simulators we still need to take care as there is no limit if I recall correctly.
Yes, you are right. The plan is to create a quantum instance that can test if we don't run out of memory and get the expected behaviour.
I found the original issue before we had any limit on kernel Qiskit/qiskit-aqua#208 (of course things have changed a bit since back then but I think the memory consumption is still an issue that needs care taken)
The plan is to create a quantum instance that can test if we don't run out of memory
I am not sure how doable that is - the Python process will often simply abend if its out of memory.
FYI the change from 1000, that was set back then, to 900, for better match/efficiency to backends is relatively recent #150
@woodsp-ibm Idea here to align batches in QK with batches on hardware as much as possible. In some cases this may lead to a reduced number of jobs submitted, thus less time to train a model. There's no intention to remove currently available batching features of QuantumInstance
, but rather add a new one. For instance, if we pass a zero batch size then we fully rely on batching on hardware level and don't optimize batches anyhow. This behavior should lead to a minimal number of jobs required to execute all circuits.
To align the batch size with the remote entity based on some remote circuit size limit per job seems fine. Some auto mode - that's what you want some magic value to do it seems - better than have the user know about these things I agree. Just saying when auto configured with a local simulator, since there will be no job limit, the batch size just needs to take care that too many circuits are not build which will run out of memory, that was the only reason there was a limit added there in the past since we ran out of memory before the QI could fragment them into limit sized chunks.
May be I was late to the discussion, But I can somehow discuss about this issue. I believe the optimal batch size for a quantum kernel can be anything just like we have in classical kernels. It may also depends on complexity of the quantum circuit, the size of the dataset, the number of qubits, and the available computational resources. However, As you have mentioned, as a general rule of thumb, a batch size of 300 or 400 is considered large for most quantum machine learning problems, especially if the quantum circuit is complex or if the dataset is large. For some simpler quantum machine learning problems or smaller datasets, a batch size of 300 or 400 may be appropriate. I have tried adjusting hyperparameters which actually improved accuracy for my machine learning models
Batching here was all about optimal job creation wrt the remote backend which had limit on number of circuits per job. Each job went to the queue for the device, which is shared. This was primarily then about optimizing jobs since the QuantumInstance, via which the circuit execution took place, would split any request into multiple jobs to ensure the backend limit was met (if not it raised an error about exceeding the limit so it had to be enforced). If you re-read the discussion you will see its all in this context. Algorithms like the newer quantum kernel here are all based on primitives and things are quite different since this was created. I see this was put on hold - judging by the date this was at the time of introducing the new primitive based kernels and deprecating the kernel that used QuantumInstance. Probably this issue should simply be closed now, as while the kernel it applied to still exists, it is deprecated and will be removed in a future release, in favor or the new kernels.
Closing the issue as QuantumKernel
has been deprecated and removed. Batching is implemented on the primitives level.
What is the expected enhancement?
The default batch size value of the Quantum Kernel (currently 900) might create unnecessary jobs when the user is not careful. The main reason for this is that some IBMQ backends have different maximum circuit values (e.g. 100, 300). To avoid this issue setting the default value to 0 and disabling batching would be better in the long run. This way advanced users can still have access to this feature.
To recereate the problem: Let's say we use the values of batch_size=400 and max_circuits=300 (defined by an IBMQ backend) With the batching system: 1500 circuits → 400+400+400+300 = 7 jobs Without the batching sytem: 1500 circuits → 300+300+300+300+300= 5 jobs
The fix is currently in progress within the QAMP project (Analyze and improve performance of Qiskit Machine Learning #14)