qiskit-community / qiskit-machine-learning

Quantum Machine Learning
https://qiskit-community.github.io/qiskit-machine-learning/
Apache License 2.0
670 stars 325 forks source link

Clustering Algorithms using Quantum Computing #232

Closed dquero closed 2 years ago

dquero commented 3 years ago

With regards to ML, Qiskit 0.30 provides Quantum Neural Networks, Neural Network Classifier, Quantum Kernel Machine Learning, qGANs for Loading Random Distributions and Torch Connector and Hybrid QNNs. I'm working to generate Clustering algorithms. At least k-means. Do you find it useful? Any suggestion?

adekusar-drl commented 3 years ago

Hi, certainly, this repository is a good place for quantum machine learning algorithms. If you could implement a new algorithm using Qiskit machinery, then it would be a good addition to this package. Let me know if you have questions on Qiskit!

dquero commented 2 years ago

I have my first version of the k-means classifier. It does not use the euclidian distance but the cos of the angle, what is directly proportional to the distance as it can be seen in related literature. I'm attaching the program here, I'm at your disposal for next actions. k-means.txt

adekusar-drl commented 2 years ago

Thanks a lot. I have a few questions:

  1. Is the algorithm from a paper? If so, could you please share the paper then? If no, please provide a description of the algorithm.
  2. What is the advantage of using quantum computers in this algorithm?
  3. Are there any comparisons versus classical algorithms?
  4. The code is in the text file. If you want to finally make a contribution to QML, please take a look at the contributing guideline: https://github.com/Qiskit/qiskit-machine-learning/blob/main/CONTRIBUTING.md
dquero commented 2 years ago

The algorithm is one of the well known family of attempts to make clustering / non-supervised machine learning more efficient by estimating distances with quantum computing. There are many articles written about it, I even could learnt from it in Spanish. Some examples:

  1. Esma Aïmeur, Gilles Brassard y Sébastien Gambs. “Aceleración cuántica para el aprendizaje no supervisado”. En: Machine Learning 90.2 (2013), pages. 261–287.
  2. Nathan Wiebe, Ashish Kapoor y Krysta M. Svore. “Algoritmos cuánticos para métodos del vecino más cercano para el aprendizaje supervisado y no supervisado”. En: Cálculo de información cuántica 15.3 (2015), pages 316– 356.

In that case, the metric instead-of-distance used is P(1) of the auxiliary qbit in a cswap gate. P(1) is proportional to the euclidian distance in the Block sphere of 2 points, as demonstrated here: https://sitiobigdata.com/2019/12/24/aprendizaje-automatico-cuantico-agrupamiento-k-means/#

My contribution to the information In this link is:

  1. I provide an algorithm that estimated all the distances in parallel (that's really the advantage) instead of a loop to estimate distances.
  2. Maybe it has no consequences. But the link above has a fallacy in the demonstration. It is supposed to demonstrate that the algorithm provides a value that grows as distance grows in the normal representation, but it needs to use the fact that length of the vector is one, so it really demonstrate that the value is proportional to the distance in the Bloch Sphere only. ---> If it's interesting, I can work on demonstrating / checking that with the representation chosen the distances in the Bloch Sphere grow as they do in the normal cartesian representation if it's interesting. Intuition indicates that this is the case and it has to be true at list for the cases that distances are not very very small.


The advantage: Instead of computing a loop with the squared root of the sum of the squares of the differences of the coordinates for each point to each centroid, it is all calculated in one step.


I will work on the style.

adekusar-drl commented 2 years ago

Sorry for the delay. The idea, in general, is clear. There's nothing on unsupervised learning in QML so far. The implementation may be a good contribution. Meanwhile, could you please point out to an article in English? I don't read in Spanish, unfortunately.

prakharb10 commented 2 years ago

Hi I recently worked on an implementation of K-Means Clustering as part of my college lab mini-project. I followed the implementation outlined here: https://arxiv.org/abs/1804.10068 pg.32-35. I'm sharing the code with you. If it seems like a good fit, I would be willing to contribute to the module, following the guidelines qml.zip . It's a Jupyter notebook, I had to upload a zip file due to format constraints.

adekusar-drl commented 2 years ago

@prakharb10, Yes, a clustering algorithm can be a good contribution to Qiskit Machine Learning as we don't have anything on that implemented.

dquero commented 2 years ago

Hi, @prakharb10 and @adekusar-drl. It seems that we both are following the same principle: P(1) in the control qbit of a cswap gate is a monotonous growing function with regards to the euclidiean distance of the swapped points. As indicated above, what I do different vs what I have found is to measure all the distances to the clusters at the same time. In parallel. The fact that the numbers of qbits are expected to keep on growing and each point can be represented with one qbit only makes it possible. And this parallel calculus based on a single gate instead of all the operations involved to calculate the Euclidean distance one by one is the advantage that the quantum implementation provides.

Here you are my code formatted with Pylint. (Only warnings in Pylint were a few variables wrongly considered constants) I have added the extension .txt in order to be able to upload it, it can be removed and the .ipynb file is what I have executed.

k_means_formatted.ipynb.txt

@prakharb10, maybe you and I can collaborate to adapt code and provide the best of the two versions.

Thanks. Regards.

prakharb10 commented 2 years ago

Hi @dquero . I'm unable to open the notebook you shared. Maybe like @adekusar-drl mentioned you could share the source paper for your implementation, in English so we could peruse that.

dquero commented 2 years ago

Hi, I'm working to find the information in English or translate it on my own. With regards to the code, I always work on-line, using the IBM Quantum Lab in IBM Quantum. The file is just exported from there (formatted and added the extension .txt to be allowed to upload it here).Maybe the easiest thing is you open as text and copy it.

(I have tried to copy it here, but comments are considered titles by github and you would need to copy the parts and gather them).

dquero commented 2 years ago

Hi, again. After reading carefully the article mentioned above (https://arxiv.org/abs/1804.10068) I see that the principle of the code is the same: using cswap gates to get a metric that is a monotonous growing function with regards to the euclidiean distance. And therefore, it is valid to calculate what is the nearest centroid of each point. What I do different is:

I have adapted the code to the PEP8 style by using Pylint and embedded parts of the program in documented functions. I would say it is now something reusable. See the link, as in the past, I have added the extension .txt to be allowed to upload it here.

k_means_formatted_v2.ipynb.txt

Thanks.

adekusar-drl commented 2 years ago

@dquero Thanks a lot for the update. Let me take a look at the code and at the paper.

dquero commented 2 years ago

In this thread I found an empty space in quantum machine learning area of qiskit: clustering algorithms; I suggested the implementation of one of them (k-means) based on the use of cswap gates; I provided some of the papers demonstrating the algorithm is equivalent in results to the classical one; I also suggested the parallelization of the distance calculation in one shot (based on the roadmap number of qbits is expected to increase in a few years above 1M, so it will keep my version scalable).

After that, the last step is the implementation, started by @prakharb10 with a pull request. At this point, this thread is therefore complete. Implementation, improvement, documentation, etc. are to be followed in other threads.