Open IliasChair opened 2 months ago
This is a weird problem. it doesn't seem to be KinBot's fault, but to be 100 % sure maybe you can send me the files necessary to replicate the problem. Something I just realized is that numpy guys just released version 2.0 less than a month ago and this brakes a lot of things which might include ASE. Do you know which version of numpy are you using? If it's >=2.0 you might wanna try with 1.x. For reference I have 1.25 in my machine.
Hi, I am using ASE 3.23.0 and Numpy 1.26.4. So this is probably not the source of the error.
I've modified nn_pes.py
s.t. the Nn_surr
class wraps the OCPCalculator
class (an ASE calculator). See attached files.
However, reproducing the error would require installing the fair-chem/OCP package and running a pretrained model.
Since this issue seems pretty niche, the work needed to dig into it might not be worth it. My temporary fix has solved the problem, so I think we can call this issue closed. If anyone else runs into it, (and ASE has not fixed it yet) they can use my fix or re-open this issue.
Best Ilias
Hello,
I'm continuing to work on integrating Kinbot
with my NN PES
and have run into two new problems that I can't resolve on my own. The first involves an issue with the Sella
optimizer, specifically with subtracting an ApproximateHessian
object containing NaNs. The second problem concerns the ASE
database interface to SQLite
, similar to what I encountered earlier.
Problem 1: Sella Optimizer and ApproximateHessian NaN Values
When running a well exploration using Kinbot
for some species, I encounter the following error:
Traceback (most recent call last):
File "home/BA/kinbot_dir/rundir_system/output/130130000000000000002_well.py", line 81, in <module>
converged = opt.run(fmax=fmax, steps=steps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/ase/optimize/optimize.py", line 269, in run
return Dynamics.run(self)
^^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/ase/optimize/optimize.py", line 156, in run
for converged in Dynamics.irun(self):
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/ase/optimize/optimize.py", line 135, in irun
self.step()
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/sella/optimize/optimize.py", line 240, in step
s, smag = self._predict_step()
^^^^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/sella/optimize/optimize.py", line 229, in _predict_step
s, smag = self.rs(
^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/sella/optimize/restricted_step.py", line 161, in __init__
BaseRestrictedStep.__init__(self, pes, *args, **kwargs)
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/sella/optimize/restricted_step.py", line 54, in __init__
self.pes.get_HL().project(self.P.T),
^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/sella/peswrapper.py", line 133, in get_HL
return self.get_H() - self.get_Hc()
~~~~~~~~~~~~~^~~~~~~~~~~~~~~
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/scipy/sparse/linalg/_interface.py", line 522, in __sub__
return self.__add__(-x)
^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/sella/linalg.py", line 236, in __add__
return ApproximateHessian(
^^^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/sella/linalg.py", line 155, in __init__
self.set_B(B0)
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/sella/linalg.py", line 170, in set_B
self.evals, self.evecs = eigh(self.B)
^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/scipy/linalg/_decomp.py", line 458, in eigh
a1 = _asarray_validated(a, check_finite=check_finite)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/scipy/_lib/_util.py", line 321, in _asarray_validated
a = toarray(a)
^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/numpy/lib/function_base.py", line 627, in asarray_chkfinite
raise ValueError(
ValueError: array must not contain infs or NaNs
After some investigation, I found that in the Sella
peswrapper
module, the PES.get_Hc()
method returns a 6x6 array of NaNs for the Hessian of the constraints. This method calls self.cons.hessian().ldot(self.curr['L'])
. (FYI) Calling self.cons.hessian().asarray()
instead returns an array of six 2D arrays consisting entirely of zeros or NaNs. The error occurs when PES.get_Hc()
retrieves the Hessian of the constraints, which contains NaNs, and then attempts to subtract this from the Hessian returned by PES.get_h()
in PES.get_HL()
.
I’ve already tested my NN PES thoroughly in its ability to predict accurate forces and energies, so I doubt the errors stem from there. Is this expected behavior in the Sella
optimizer? If not, do you have any ideas on what might be causing it?
I have also notices that this error often occurs for systems of size 1 or 2. For systems with only one atom, my NN_PES looks up the energies in a lookup table and sets the forces to zero. Could the zero forces cause this issue?
This issue might also be related to two other issues on the Sella
git: Minimal example of N2 with EMT or LJ leads to a crash, Constrained optimization diverging during hessian update
Traceback (most recent call last):
File "home/BA/kinbot_dir/rundir_system/output/130130000000000000002_well.py", line 114, in <module>
db.write(mol, name='130130000000000000002_well', data=data)
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/ase/parallel.py", line 244, in new_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/ase/db/core.py", line 203, in new_method
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/ase/db/core.py", line 349, in write
id = self._write(atoms, kvp, data, id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "home/micromamba/envs/Kinbot_env/lib/python3.11/site-packages/ase/db/sqlite.py", line 339, in _write
cur.execute('INSERT INTO systems VALUES ({})'.format(q),
sqlite3.IntegrityError: UNIQUE constraint failed: systems.unique_id
It occurs because there's already an entry in the database with the same unique_id
. As far as I can tell, the unique_id
is generated from data in the Atoms
object. I suspect two possible causes: either two different Atoms
objects are incorrectly mapping to the same unique_id
, or Kinbot
is attempting to re-run a failed simulation with an incomplete entry already present in the DB. Upon inspecting the database, I noticed some entries are missing key data, like energies and forces.
Although the error appears in the ASE DB code, I believe it may stem from either Kinbot
or Sella
. I've spent a significant amount of time troubleshooting this and now believe it's beyond my ability to resolve alone.
I’ve attached a well exploration script generated by Kinbot
that triggers these errors, as well as subset of the database in csv format for reference.
As a first step, could you check if this error occurs for you as well?
Any help or guidance would be greatly appreciated.
Hello again,
Here’s a little update. I believe I managed to fix the first issue regarding the NaN values by merging the pull request on the Sella
GitHub page into my local branch, which removes the deprecated force_consistent parameter
. I have not encountered the crash since then.
I’ve discovered that the problem lies in how the unique_id
is generated in the ASE database code. In the ase.db.row.py
module, the unique_id
is created using:
'unique_id': '%x' % randint(16**31, 16**32 - 1)
This generates a random integer within a specific range using random.randint and then converts it to a hexadecimal string.
However, when I run multiple well explorations in Kinbot
— for example, for 390870650390040000002_well
and 330490170000000000002_well
— the random integer generator returns the same number for the unique_id
. I’ve also confirmed that the generated unique_id
remains the same across different executions of the same code.
I still believe this issue is somehow tied to Kinbot
, because when I save the ASE
Atoms
objects from the well exploration scripts and then create the unique_id
separately in another script, the random number generator produces different numbers. This suggests that the random number generation algorithm may be influenced by something in Kinbot
, causing the same unique_id
to be generated for different well explorations and runs.
However, this behavior may be intentional, as we want Kinbot
to access the output from previous computations stored in the database using the unique_id
, allowing it to skip re-running calculations. That said, this does not explain why two different Atoms objects are generating the same unique_id
.
Any assistance or insights regarding the cause of this behavior, as well as suggestions on how to best tackle the issue, would be greatly appreciated.
Best Ilias
Hello all,
I'm encountering an error when trying to run Kinbot using my own NN PES, as mentioned in issue #79 . The error seems to occur within the code related to the databases in the ASE library.
Since the problem appears to be linked to ASE, I'm unsure whether it would be appropriate to also report this in the ASE forums . But I believe it's worth raising this issue here as well.
The traceback looks as follows:
The error is triggered in the
deblob()
function of thesqlite.py
script from the ASE package, when attempting to use thenp.frombuffer()
function to decode a binary buffer string to a data type, which by default is float64. It appears that the issue is related to the data type of the buffer stored in the SQLite database. ASE seems to be decoding this binary buffer string asfloat64
by default, while the actual data is infloat32
format. For now, I have managed to resolve this issue by editing thedeblob()
function in thesqlite.py
script. Here’s the change I made:This approach allows the code to proceed normally in most cases, using
dtype=float64
. If the error is triggered, the code attempts to decode the buffer asfloat32
, which resolves the issue for my NN PES.I am unsure whether this error originates from the Kinbot code, where it might be assumed that calculators always return values as
float64
. Or from the ASE sqlite library, which might be presuming that the database only containsfloat64
values.The the above tempoary solution fixed this issue for me for now, but I think it still should be investigated where this error originally stems from.
Thank you for the quick responses and helpful attitude.
Best Ilias