microsoft / qsharp-runtime

Runtime components for Q#
https://docs.microsoft.com/quantum
MIT License
285 stars 93 forks source link

Can we speed up ResourcesEstimator / QCTraceSimulator? #1028

Open tanujkhattar opened 2 years ago

tanujkhattar commented 2 years ago

Please describe what you would like the feature to accomplish.
Q# is probably one of the most promising tools for resource estimation of FT algorithms. However, from an initial assessment, it looks like the built-in resource estimator is not scalable for counting T-gates above O(millions).

For example, running the GetGateCount example on my machine, it takes ~2.2 hours to run resource estimation for a Qubitization step doing ~34 million T gates.

$> git clone https://github.com/microsoft/Quantum.git && cd Quantum/samples/chemistry/GetGateCount
$> dotnet run -- --path=../IntegralData/Liquid/fe2s2_sto3g.dat --format=Liquid --skip-trotter-suzuki --skip-opt-qubitization
Gate Count results on fe2s2_sto3g.dat
by Qubitization with 112 spin-orbitals. It took 7943955 ms (~2.2 Hours).
Gate count statistics: 
# T:34_608_828, 
# Rotations:37_883_236, 
# CNOT:59_700_9557, 
Norm Multipler:4070.453884808356

IIUC, the QCTraceSimulator does not do any caching on the edges of the callgraph; i.e. each edge of the graph is traversed every time the target operation is invoked with a new set of parameters. This is important to give precise resource estimates, because the set of parameters can determine the gate complexity of the method, but this is also probably why it's so slow?

Describe the solution you'd like
It would be great if we can figure out a way so that the resource estimator scales to O(billions) / O(trillions) of T gates.

Some concrete questions I have are as follows:

  1. Is my understanding of how QCTraceSimulator works correct?
  2. Have people in this community encountered / thought about this issue? What are some known nuances that make this problem challenging?
  3. Have we considered doing some kind of caching on the operations in order to avoid iterating every each edge every time it's called? Maybe we can give users a way to specify which arguments of an operation should be cached (eg: the exact qubit register in input) vs which arguments should not be cached (eg: the number of qubits on which a method should be applied on) ?

cc @NoureldinYosri @mpharrigan

bettinaheim commented 2 years ago

@msoeken I believe this is relevant for planned work.

msoeken commented 2 years ago

@tanujkhattar, thanks for your detailed issue. The resources estimator in QDK can be extended to implement custom caching strategies. Here is a sample that describes how one could implement the caching you describe in the issue.

Sample: https://gist.github.com/msoeken/589a54fd683c76fbe4dcfc457ee9e135

In the sample it only caches CNOT gates, but it can easily be extended to cache other gates as well. More fine-grained scenarios are possible in which the input arguments are taken into consideration.

In the sample I am calling an operation with 3 CNOTs 1,000 times, the operation also creates 1,000,000 random values to imitate some computation. These are the runtime differences on my laptop with and without caching:

Without caching: 10.53 secs With caching: 0.97 secs

Here are some resources that might be helpful to understand how the resources estimator can be customized: