qiskit-community / qiskit-machine-learning

Quantum Machine Learning
https://qiskit-community.github.io/qiskit-machine-learning/
Apache License 2.0
680 stars 325 forks source link

ad_hoc_data(…) resulting in the test dataset as an array instead of a dictionary #177

Closed Shashank-Ravi-0 closed 3 years ago

Shashank-Ravi-0 commented 3 years ago

Information

What is the current behavior?

The function ad_hoc_data(...) is resulting in the test dataset as an array.

type(test_input)
> numpy.ndarray

This is also evident from the following AttributeError message:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-f11e4f763266> in <module>
     10                                                                n=feature_dim,
     11                                                                plot_data=True)
---> 12 datapoints,class_to_label=split_dataset_to_data_and_labels(test_input)

~\anaconda3\lib\site-packages\qiskit\aqua\utils\dataset_helper.py in split_dataset_to_data_and_labels(dataset, class_names)
     83     labels = []
     84     if class_names is None:
---> 85         sorted_classes_name = sorted(list(dataset.keys()))
     86         class_to_label = {k: idx for idx, k in enumerate(sorted_classes_name)}
     87     else:

AttributeError: 'numpy.ndarray' object has no attribute 'keys'

Steps to reproduce the problem

Reproduced exactly from Qiskit's tutorials from YouTube.

feature_dim=2
training_dataset_size=20
testing_dataset_size=10
random_seed=10598
shots=10000

sample_Total,training_input,test_input,class_labels = ad_hoc_data(training_size=training_dataset_size,
                                                               test_size=testing_dataset_size,
                                                               gap=0.3,
                                                               n=feature_dim,
                                                               plot_data=True)
datapoints,class_to_label=split_dataset_to_data_and_labels(test_input)

What is the expected behavior?

ad_hoc_data should be returning a dictionary for the test dataset. As shown in the code, split_dataset_to_data_and_labels(test_input) also expects the test_input to be in dict type

Suggested solutions

woodsp-ibm commented 3 years ago

This was originally posted here https://stackoverflow.com/questions/68598726/ad-hoc-data-resulting-in-test-input-as-an-array-instead-of-a-dictionary

As I mentioned there the above code sample works fine if you use these imports (i.e using all Aqua - which is deprecated now though)

from qiskit.ml.datasets import ad_hoc_data
from qiskit.aqua.utils import split_dataset_to_data_and_labels 

It shows you using Aqua in the error message i.e. it occurs related to

anaconda3\lib\site-packages\qiskit\aqua\utils\dataset_helper.py which no longer exists in this repo.

There were changes done to datasets. What Tutorial are you following - I imagine it was done for Aqua if you are using function from Aqua.

This tutorial from Machine Learning here shows using ad_hoc with the code here for Classification https://qiskit.org/documentation/machine-learning/tutorials/03_quantum_kernel.html?highlight=ad_hoc#Classification Maybe this helps you.

Shashank-Ravi-0 commented 3 years ago

This was originally posted here https://stackoverflow.com/questions/68598726/ad-hoc-data-resulting-in-test-input-as-an-array-instead-of-a-dictionary

As I mentioned there the above code sample works fine if you use these imports (i.e using all Aqua - which is deprecated now though)

from qiskit.ml.datasets import ad_hoc_data
from qiskit.aqua.utils import split_dataset_to_data_and_labels 

It shows you using Aqua in the error message i.e. it occurs related to

anaconda3\lib\site-packages\qiskit\aqua\utils\dataset_helper.py which no longer exists in this repo.

There were changes done to datasets. What Tutorial are you following - I imagine it was done for Aqua if you are using function from Aqua.

This tutorial from Machine Learning here shows using ad_hoc with the code here for Classification https://qiskit.org/documentation/machine-learning/tutorials/03_quantum_kernel.html?highlight=ad_hoc#Classification Maybe this helps you.

Yes I completely understand that. The following code works and makes sense.

from qiskit.ml.datasets import ad_hoc_data
from qiskit.aqua.utils import split_dataset_to_data_and_labels 

But my doubt is : What should be my procedure of import after taking into consideration the deprecation. That is, like from qiskit_machine_learning.datasets import ad_hoc_data , what should be my "new" valid code for importing split_dataset_to_data_and_labels ? (i.e. importing without ANY DeprecationWarning)

woodsp-ibm commented 3 years ago

I had linked above a new tutorial that shows the use of the ad_hoc dataset as it now is. Code when it was moved from Aqua was also possibly refactored/removed etc. While your code "works" in accessing the dataset, what you got back as values before is different from what you get now. Hence the error you got when trying to take data from the dataset here and giving it to the old aqua routine. If I change your code to this

from qiskit_machine_learning.datasets import ad_hoc_data

feature_dim=2
training_dataset_size=20
testing_dataset_size=10
random_seed=10598
shots=10000

train_features, train_labels, test_features, test_labels, adhoc_total = ad_hoc_data(
    training_size=training_dataset_size,
    test_size=testing_dataset_size,
    n=feature_dim,
    gap=0.3,
    plot_data=True, one_hot=False, include_sample_total=True
)

and I am naming the vars as per the tutorial I linked above, that uses adhoc data, so you can clearly see by name what they contain, then you will see what you get back is features and labels. It is not a dictionary of labels as keys with data that you can split apart - which is why you get the error when you try and put return values from the new dataset into the old code that tries to split them. The method is no longer here.

Is that clearer?

Shashank-Ravi-0 commented 3 years ago

Yes. @woodsp-ibm Thank you so much for the detailed explanation.

hussainshaikresearch commented 2 months ago

import numpy as np from dataset import breast_cancer from sklearn.datasets import load_iris from qiskit_machine_learning.datasets import ad_hoc_data from sklearn import svm from utils import svm_utils from matplotlib import pyplot as plt %matplotlib inline %load_ext autoreload %autoreload 2

ImportError: cannot import name 'svm_utils' from 'utils' How to resolve this error?

woodsp-ibm commented 2 months ago

I know I referenced this issue for your question. This issue was closed years ago and your question has little to do with machine learning code from here, As I answered your same question in Slack https://qiskit.slack.com/archives/CB6C24TPB/p1725452923977019?thread_ts=1725439502.037949&cid=CB6C24TPB

What is the module utils that you trying to import it from. I see one one on pypi but I do not see anything svm in that, just stuff around core things.

Any further discussion should happen in Slack not in this closed issue, thanks.