qiboteam / qibojit-benchmarks

Benchmark code for qibojit performance accessment
Apache License 2.0
2 stars 3 forks source link

Scripts to generate paper plots #37

Closed stavros11 closed 2 years ago

stavros11 commented 2 years ago

I updated the plot generating scripts so that we have everything used in the latest version of the paper. I wrote the plotting methods in Python scripts instead of notebook and I removed the current notebooks. As we discussed, scripts have the disadvantage of being slightly harder to customize for different logs and benchmark configurations. Let me know what you think.

I also added a notebook which can be used to generate all plots used in the paper using the logs that I uploaded in a gist.

List of plots in the paper:

scarrazza commented 2 years ago

@stavros11 did you update the gist with the 36 qubits run for EPYC?

stavros11 commented 2 years ago

@stavros11 did you update the gist with the 36 qubits run for EPYC?

I just updated qibo_scaling_cpu.dat to include the points up to 36 qubits on the EPYC. I will also update this PR to make the plotting script a bit more flexible, for example allow passing different device names and data for Fig. 4 via the data dictionary keys, without having to modify the Python script every time.

stavros11 commented 2 years ago

I implemented the following changes since the last time we discussed:

This is complete in terms of the plots we have in the current version of the paper. We are only missing GPU bars for qsim but I would postpone this until we find how to disable fusion. As discussed, we could also add an additional plot where we do fusion and we compare numba/cupy/cuquantum but this can be done in a different PR.

@andrea-pasquale when you have time check if you can generate all plots from the latest version of the paper using the paper-plots.ipynb notebook implemented here. Remember to download the latest version of the logs from the gist.

@andrea-pasquale, @mlazzarin let me know if you have any comments regarding plot scripts and the plots (especially colors, sizes, etc.).

scarrazza commented 2 years ago

@stavros11 thanks you very much. Could you please include the QLM numbers from here. You should get this:

image

I think we should include some ref. about the engine numba/cupy and the available memory/threads.

stavros11 commented 2 years ago

@stavros11 thanks you very much. Could you please include the QLM numbers from here.

Thanks for checking and the numbers, I added these to the gist and the notebook. I added the memory, threads and engine information in the legend. I also did some multigpu benchmarks with the RTX+EPYC from 32 to 35 qubits and added this line as well. The final plot is:

image

You can add/remove lines in this plot from the paper-plots notebook directly, so we can decide which lines we will show in the final paper and the colors, markers, etc..

scarrazza commented 2 years ago

Great, looks good. The multi-gpu is interesting to show, however we should probably include a real multi-gpu setup using the dgx.

scarrazza commented 2 years ago

Here some points from atos simulator: image I have 2 observations:

stavros11 commented 2 years ago

Thanks for checking and the comments.

I like all the plots, even the colors. For the sizes I feel like Figs 5 and 6, mainly figure 6, are a bit too crowded compared to Fig. 2. Do you think that it will look better if we plot separately the 20 and the 30 qubits plots?

I guess that by separately you mean to use two columns for each plot (like Fig. 2) instead of putting two plots on the same line? In terms of how the plots are generated in the notebook, 20 and 30 qubits are already separate plots saved in different pdfs. I am not sure which are the preferred sizes for plots, but I have the impression that for journals smaller is better (for space reasons) so I would instead made Fig. 2 (and Fig. 7) single column so that their size is more consistent with the rest plots. In terms of crowded, I am not sure if there is a way around this if we want to compare all the libraries. I believe as it is now it is still readable though.

Another minor comment: in Fig. 3 and 4 the CPUs are plotted as straight lines in order to distinguish them from the GPUs correct?

Yes and also because this plot started getting too crowded in terms of line so I removing some markers makes it a bit easier to read. I was thinking to remove all markers (also for GPU), since the colors are sufficient to distinguish all lines, but on the other hand I like the markers as they show that these are not really continuous lines, the x-axis (nqubits) is discrete and we are "interpolating" the line. I am not sure what is the best way to go here.

stavros11 commented 2 years ago
  • the performance seems better than qibojit however I am pretty sure they are doing fusion and (maybe) limiting the number of threads

I am not sure if it's fusion because the QFT circuit we are benchmarking here is one of the examples where fusion does not help. Some libraries are even slower if fusion is enabled for QFT. Another example is the latest Fig. 6 (single precision - 30 qubits) in the paper. qsim is using fusion because we don't know how to disable so it's much faster than everything else for all circuits, except QFT.

Otherwise, we can always enable fusion for other libraries and check how it compares to ATOS (even in the same machine). This can be done with our existing script:

python compare.py --nqubits 30 --library qibo --library-options max_qubits=2

where max_qubits is the maximum number of qubits for fused gates. Qibo supports up to 2, qiskit supports more.

Can we install the ATOS software in other machines to check too, or is it only shipped with their machine?

andrea-pasquale commented 2 years ago

I guess that by separately you mean to use two columns for each plot (like Fig. 2) instead of putting two plots on the same line?

Yes, that's what I meant.

In terms of how the plots are generated in the notebook, 20 and 30 qubits are already separate plots saved in different pdfs. I am not sure which are the preferred sizes for plots, but I have the impression that for journals smaller is better (for space reasons) so I would instead made Fig. 2 (and Fig. 7) single column so that their size is more consistent with the rest plots.

Ok, I agree with making Fig. 2 and 7 single column. If they look worst we can keep them two columns.

Yes and also because this plot started getting too crowded in terms of line so I removing some markers makes it a bit easier to read. I was thinking to remove all markers (also for GPU), since the colors are sufficient to distinguish all lines, but on the other hand I like the markers as they show that these are not really continuous lines, the x-axis (nqubits) is discrete and we are "interpolating" the line. I am not sure what is the best way to go here.

If we remove all the markers the plot will look like this

devices_qft_total_dry_time_double_page-0001

which personally I don't like too much. I agree with keeping the markers. Another possibility would be to add errorbars instead of the markers, but so far in the paper we haven't used any errorbars. If you like to explore this possibility I can give it a try.

scarrazza commented 2 years ago

Yeah, fusion doesn't help here, and unfortunately there is no way to install their proprietary code. image

stavros11 commented 2 years ago

I just realized that there is a bug in our plotting script for Fig. 2 and we are plotting the total_dry_time/total_simulation_time on top of the import time bar. This way the import time is plotted twice, once alone with grey and once as part of the orange/purple/green bars (@mlazzarin, @andrea-pasquale correct me if I'm wrong). To fix this we should only plot the dry_run_time/simulation_times_mean in the colored bars on top of the grey import time. If I do this, the plot looks like:

image

which is not very pretty, but I believe it makes sense, since we know from Fig. 3 too that for <25 qubits the import time is actually the dominating part. Now it is also more clear that import time is the main difference between cupy and cuquantum. If we don't like to present this plot, we can either change the nqubits range to higher numbers (eg. 22 to 30) or fix the nqubits to one or two different numbers and plot different circuits (similar to the plot where we compare libraries).

From a practical perspective, it's funny that all optimizations we're doing to improve dry run and simulation are relevant only for >25 qubits, while most qibo users will probably use it for smaller circuits.

stavros11 commented 2 years ago

I also did some fusion benchmarks using the qibojit backend and added the corresponding plotting scripts. Here are some plots:

Supremacy circuit for different number of qubits ![image](https://user-images.githubusercontent.com/35475381/156370737-ed5f9894-2b8a-4f37-ba94-689586e21d34.png) ![image](https://user-images.githubusercontent.com/35475381/156370752-c6de5be1-ce18-4af2-ab8d-c3a5bfd9100c.png)
All circuits for 30 qubits ![image](https://user-images.githubusercontent.com/35475381/156370829-50ec26f2-3a6a-4056-8fc4-894c5a2b01ce.png) ![image](https://user-images.githubusercontent.com/35475381/156370841-2524d922-b570-4878-b17c-0df2cd37d5bd.png)

It seems that fusion helps much more on CPU than what it does for GPU. It also seems strange to me that cuquantum is still worse than cupy. Here I am plotting total times (import + creation + dry run/simulation). I was planning to use the second kind of plots (for different circuits) in the paper. Let me know what you think.

andrea-pasquale commented 2 years ago

Thanks for spotting the bug @stavros11. I tried the fix and it works. Now the plot makes more sense.

Now it is also more clear that import time is the main difference between cupy and cuquantum. If we don't like to present this plot, we can either change the nqubits range to higher numbers (eg. 22 to 30) or fix the nqubits to one or two different numbers and plot different circuits (similar to the plot where we compare libraries).

I think that changing the range from 22 to 30 qubits is a good solution.

andrea-pasquale commented 2 years ago

It seems that fusion helps much more on CPU than what it does for GPU. It also seems strange to me that cuquantum is still worse than cupy. Here I am plotting total times (import + creation + dry run/simulation). I was planning to use the second kind of plots (for different circuits) in the paper. Let me know what you think.

Indeed, it's quite strange that cuquantum is still slower. We need to keep in mind the difference in the import time, but I think that cupy should still be faster. The second plot is nice, maybe we should change the legend to an horizontal one. I think that you can to that by changing the number of columns. I don't know maybe it will look worst.

scarrazza commented 2 years ago

Thanks for spotting the bug and preparing the fusion plot.

I was planning to use the second kind of plots (for different circuits) in the paper. Let me know what you think.

Yes, I like it.

mlazzarin commented 2 years ago

Thanks @stavros11 for spotting the bug.

Concerning gate fusion, we have a reply from the qsim developers https://github.com/quantumlib/qsim/issues/510

stavros11 commented 2 years ago

Concerning gate fusion, we have a reply from the qsim developers quantumlib/qsim#510

Ok, so there is no way to disable fusion for qsim. We cannot use it in the library comparison plots then, as it will be an unfair comparison to others (qibo, qiskit, qulacs) which have fusion but is disabled. We could include it in the fusion section though. For example, in addition to the qibojit platform comparison I posted above, we could have another bar plot which compares qibo, qiskit and qsim for all circuits with 30 qubits on CPU and GPU.

stavros11 commented 2 years ago

Following my previous comment, here is a plot comparing fusion up to two qubits with qibo, qiskit and qsim.

image

I believe we could remove qsim from the existing plots and add a section about a fusion in the paper, where we use this plot and the one with qibojit numba, cupy, cuquantum comparison. Let me know what you think.

scarrazza commented 2 years ago

I fully agree, thank you for the great plot.

andrea-pasquale commented 2 years ago

I believe we could remove qsim from the existing plots and add a section about a fusion in the paper, where we use this plot and the one with qibojit numba, cupy, cuquantum comparison. Let me know what you think.

I also agree, in the latest plot it could be interesting to add the cuquantum version both for qibo and qsim.

mlazzarin commented 2 years ago

Thanks for the plot. Notice that GPU execution times are higher than CPU ones for qiskit and qsim, at least for some circuits. @stavros11 do you have an idea about why it happens?

stavros11 commented 2 years ago

Following our discussion, I updated the scripts to:

image

I also updated the logs in the gist. I will update these plots in the paper too. This PR should be ready now and we could consider to start merging. If we need to add or modify plots for the paper, we can open new PRs.

scarrazza commented 2 years ago

@stavros11 thank you, lets start merging.