qiskit-community / qiskit-machine-learning

Quantum Machine Learning
https://qiskit-community.github.io/qiskit-machine-learning/
Apache License 2.0
648 stars 316 forks source link

Fix randomization in the tutorials #194

Closed adekusar-drl closed 3 years ago

adekusar-drl commented 3 years ago

What is the expected enhancement?

Currently, Qiskit Machine Learning tutorials use random number generator for various purposes, usually for generating datasets. This request aims at reviewing the tutorials and fixing a seed in the generator for reproducibility purposes.

R-Yassh commented 3 years ago

Interested to contribute. This is my very first issue, so will need a bit guidance to begin😅. I have read the contribution guidelines and joined the slack community. Let me know how to proceed. Thanks in advance.

adekusar-drl commented 3 years ago

Welcome! An idea here is to review the tutorials and check where we have randomization. In such places a fixed seed should be added to the random generator, e.g. have something like this:

algorithm_globals.random_seed = 12345
x = algorithm_globals.random.random(5)
R-Yassh commented 3 years ago

I have modified the first tutorial notebook (_docs/tutorials/01_neuralnetworks.ipynb). Let me know if this is what was expected and if I should proceed with other tutorials similarly.

adekusar-drl commented 3 years ago

Thanks, at the very first glance looks good. Altough I have to check notebooks in jupyter as well. I don't think there's a need for SEED = 42 if the constant is not used any more, the value can be assigned directly.

R-Yassh commented 3 years ago

Yes, it is not required here. I will remove that. The seed variable was being used in other tutorials at multiple places, so I thought of using this constant across all tutorial notebooks for the sake of consistency.

R-Yassh commented 3 years ago

I have updated files _01_neuralnetworks.ipynb and _02_neural_network_classifier_andregressor.ipynb to seed the random generators. Other tutorials already have the seeds in place. Creating a PR for review.

woodsp-ibm commented 3 years ago

@adekusar-drl In unit testing, other potentially random influences come from the simulator, e.g. random sampling for the shots. Also some transpilers have random aspect to their optimization pass - as such seed_simulator and seed_transpiler are also possible. Clearly though these are supposed to be a bit more like a real-world application than a unit test, but having them reproduce the same output should a user a choose to run them gives the user confidence that things are right. Maybe neither of these seeds are needed and simply fixing randomness of any input data, initial points etc suffices, after all I guess based on shot noise users might expect to see differences which would not then be the case if the simulator seed we set.

adekusar-drl commented 3 years ago

@woodsp-ibm Main idea here is to have same datasets across multiple runs and since we plot them, then, I think, it is nice to have same plots/numbers every time we run the tutorials. Of course, we can extend the issue to cover other things and other seeds, like you said in transpiler, but, I guess, it can be done later when we fix the tests.

woodsp-ibm commented 3 years ago

Ok, since the simulator will have shot noise when not using the statevector, I saw the qasm sim being used, but did not check which output/plots exactly came from it. Just noting that things may look a little because of that.