Open yurivict opened 5 years ago
We use random offset because we don't want to plant all base atoms at exactly the same spot in the process of coordinate generation (Edited): https://github.com/openbabel/openbabel/blob/master/src/builder.cpp#L1141
Maybe we should introduce a new option to fix random seed.
But xyz coordinates after gen3D
are supposed to be all different, no?
Yes. Random numbers are used in the process of 3D coordinate generation.
Yes. Random numbers are used in the process of 3D coordinate generation.
But this isn't needed because coordinates should be all different. No need to add random numbers.
SMILES format does not include the information of 3D coordinates, so we must predict them. The coordinates should be different in the end of the coordinate generation, but they are unknown in the beginning. Random offset is used in this prediction and we had problems without random numbers.
gen3D
plugin should erase randomness, and rewrite them with new values it generates. Why does it keep random coordinates?
Because newly generated values have randomness because of the nature of the coordinate generation algorithm which uses random numbers inside.
We will update the coordinate generation algorithm in the near future, and I want to make it reproducible. In my opinion, introducing a new option to set random seed is a good option.
I agree that the same coordinates should be generated each time. Also, we should not use random numbers. The original problems for which they were introduced can be solved without random numbers.
@baoilleach @ghutchis
This is still a problem in OpenBabel-3.1.1
Is there a workaround to get rid of randomness?
I totally agree that 3D structure generation should be repeatable, but randomness is sometimes useful in order to generate multiple conformations. Especially, distance geometry, a new 3D structure generation algorithm introduced in Open Babel 3.1.0, needs to use random numbers to generate multiple conformations.
I believe introducing a new argument to specify the random seed is the best solution. However, I'm wondering how to fix the seed, since OBRandom
is removed from the public API in Open Babel 3.0.0 (#1954).
It's a very bad idea to merge operations that don't belong together. In this case there seem to be two operations: 1. SMILES->COORDS convertor, 2. coordinate randomizer, that are merged into one.
Some people only need the convertor, some people only need the randomizer, and some people need both, but OpenBabel merged these two into one and forces everybody to have both together.
Just split them into two separate functions, and this would solve problems like this one.
I'm thinking to add seed parameters, defaulted to e.g. epoch, to functions that use OBRandom (will be added to #2241).
I added Seed
method to the classes that has prng, which enables a user to manually feed a seed before using random numbers.
At present I have no modifications on functions and executables which indirectly uses random numbers, e.g., obabel
.
Regarding the original question, we shouldn't be using random number generators in 3D coordinate generation (outside the distance geometry). There is no need for them, and they should be removed. In 3.0 I removed OBRandom from the public API so that this change could be made more easily, and to avoid future uses of random numbers. PRs accepted to fix this problem.
@e-kwsm Thank you for your work. I will take a look.
@baoilleach I understand that randomness should be removed. I would like to work on it, but I don't have a good idea about how to avoid using random numbers. For example, this procedure adds random offset to avoid planting atoms at the same place. How can I give a good offset which is deterministic but (almost) always different? https://github.com/openbabel/openbabel/blob/4709b0752109db593f1bee6298ef3bfa718d260d/src/builder.cpp#L1290
Another problem is how to realize trial and error without random numbers like this: https://github.com/openbabel/openbabel/blob/4709b0752109db593f1bee6298ef3bfa718d260d/src/builder.cpp#L315
I would like to know your idea about how to avoid using random numbers in these situations.
@baoilleach - while it's good to minimize randomness through the use of designed seeds, generating 3D geometries require stochastic methods. Beyond the basic builder, there's the conformer searching, distance geometry, …
I asked @e-kwsm to allow the seed to be set from the command-line, which seems like it would solve this bug.
I think we agree that when we need random numbers we should use them. My opinion is that we shouldn't use them otherwise.
@n-yoshikawa In the first case, if you look at the original bug where this was the fix, I think shifting the salt along the x-axis past all of the existing atoms will fix this. In the second case, iterating over a fixed list of 9 vectors that cover the sphere is sufficient - we just need to find one that is not aligned or othogonal to the query vector (or something like this).
It is hard to change all APIs related to PRNG, so I added OB_RANDOM_SEED
environment variable (#2241).
$ export OB_RANDOM_SEED=42
$ diff <(obabel -:CCC --gen3D -oxyz) <(obabel -:CCC --gen3D -oxyz)
1 molecule converted
1 molecule converted
@ghutchis and @e-kwsm what is the status of this? It would be great to be able to pass in an option to obabel
giving a seed so that the --gen3d
option always returns the same geometry. The last commit here gives a workable solution I think, where you would have to set the seed as an environment variable before calling obabel
, but it hasn't been merged.
Hi, I would also be very interested in the suggested solution to provide the env var OB_RANDOM_SEED. I this change usable at all at this point? thanks
@byte-for-byte it is indeed usable. I have built from the branch in that pull request and everything works as expected.
With this
x.smi
file:O=C(O)CCCCC
I run this command:The produced outputs are a little bit different with every run:
The actual molecules are the same, typical
RMSD
differences are0.00190
,0.00102
,0.00067
.OpenBabel should never use random numbers to randomize molecules, because this is bad for testing frameworks that expect repeatability.