Closed hoelzerC closed 4 days ago
Same result for CLI usage: mindlessgen --num-molecules 2
Ok, I resolved the issue. It seems to be a cross-issue with other packages installed in my virtual environment. Having a plain mindlessgen
environment allows for generation of multiple molecules. 🎉️
As a remark, maybe include some sanity check that not the same hash is used for multiple runs. E.g. check the molecule ID written to mindless.molecules
.
Great to hear! 🎉 Do you know the exact reason for np.random
giving always the same result? So that we can include it to a Known Issues section?
Hard to say, especially because the numpy.random
seed was different at each run. Guess it was an exotic cross-effect. Hopefully, this should not happen to too many users. As mentioned above, checking for identical output might be a good sanity check and worth raising a notification.
My solution is to add a random number seed associated with time like this, It's not an elegant way, but works
in /src/mindlessgen/molecules/generate_molecule.py
357 def generate_random_coordinates(at: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
358 """
359 Generate random coordinates for a molecule.
360 """
361 # import the time module
362 import time
363
364 # get the current time in seconds since the epoch
365 seconds = time.time()
366
367 atilist: list[int] = []
368 xyz = np.zeros((sum(at), 3))
369 numatoms = 0
370 np.random.seed(int(seconds)+1929)
My solution is to add a random number seed associated with time like this, It's not an elegant way, but works
in /src/mindlessgen/molecules/generate_molecule.py
357 def generate_random_coordinates(at: np.ndarray) -> tuple[np.ndarray, np.ndarray]: 358 """ 359 Generate random coordinates for a molecule. 360 """ 361 # import the time module 362 import time 363 364 # get the current time in seconds since the epoch 365 seconds = time.time() 366 367 atilist: list[int] = [] 368 xyz = np.zeros((sum(at), 3)) 369 numatoms = 0 370 np.random.seed(int(seconds)+1929)
Thanks for your feedback!
But shouldn't the np.random.seed
be not fixed in standard applications and clean environments? Do you have an idea how to check for this issue before starting a calculation or how to get rid of it?
EDIT: The np.random.seed(<value>)
assignment acts as a global variable and can therefore be modified in other code pieces or previous numpy
executions. Might contained in this blog post, in which the current practice in mindlessgen
is considered as legacy best practice: https://builtin.com/data-science/numpy-random-seed
The correct way to do it would probably be the following (setting up an own random number generator instance):
https://numpy.org/doc/stable/reference/random/index.html
Thanks for your reply,
My guess is the 'random number generator
' in the multiprocessing pool generates the same seeds for each subprocess.
Just like discussion in this question,
https://stackoverflow.com/questions/29854398/seeding-random-number-generators-in-parallel-programs
It seems that always the same molecule is found when choosing
num_molecules > 1
.Can easily be reproduced, taking the default config.