materialsproject / maggma

MongoDB aggregation machine
https://materialsproject.github.io/maggma/
Other
38 stars 32 forks source link

[Bug]: Handling of deprecated `np.bool_` is not ideal #1006

Open Andrew-S-Rosen opened 3 hours ago

Andrew-S-Rosen commented 3 hours ago

Code snippet

# pip install maggma ase==3.23.0 pymatgen==2024.10.25 numpy==1.26.4
from ase.build import bulk
from pymatgen.io.ase import AseAtomsAdaptor
from maggma.stores import MemoryStore

atoms = bulk("Cu")
structure = AseAtomsAdaptor.get_structure(atoms)
print(type(structure[0].lattice.pbc[0]))  # np.bool_

store = MemoryStore()
with store as s:
    s.update(structure, key="charge")

What happened?

When trying to store a document with a numpy 1.x np.bool_ entry, the underlying bson dependency raises an InvalidDocument error and maggma can't store the document in the connected store. Some sanitization should likely be done to address this.

Hmm, but the following works okay, so there is something in-between here:

with store as s:
    s.update({"test": 1, "val": (np.bool_)}, key="test")

Version

0.70.0

Which OS?

Log output

Cell In[1], line 11
      9 store = MemoryStore()
     10 with store as s:
---> 11     s.update(structure, key="charge")

File ~\miniconda\envs\test\Lib\site-packages\maggma\stores\mongolike.py:395, in MongoStore.update(self, docs, key)
    393 if len(requests) > 0:
    394     try:
--> 395         self._collection.bulk_write(requests, ordered=False)
    396     except (OperationFailure, DocumentTooLarge) as e:
    397         if self.safe_update:

File ~\miniconda\envs\test\Lib\site-packages\mongomock\collection.py:1852, in Collection.bulk_write(self, requests, ordered, bypass_document_validation, session)
   1850 for operation in requests:
   1851     operation._add_to_bulk(bulk)
-> 1852 return BulkWriteResult(bulk.execute(), True)

File ~\miniconda\envs\test\Lib\site-packages\mongomock\collection.py:331, in BulkOperationBuilder.execute(self, write_concern)
    329 exec_name = execute_func.__name__
    330 try:
--> 331     op_result = execute_func()
    332 except WriteError as error:
    333     result['writeErrors'].append({
    334         'index': index,
    335         'code': error.code,
    336         'errmsg': str(error),
    337     })

File ~\miniconda\envs\test\Lib\site-packages\mongomock\collection.py:157, in BulkWriteOperation.register_update_op.<locals>.exec_update()
    156 def exec_update():
--> 157     result = collection._update(spec=selector, document=document,
    158                                 multi=multi, upsert=self.is_upsert,
    159                                 **extra_args)
    160     ret_val = {}
    161     if result.get('upserted'):

File ~\miniconda\envs\test\Lib\site-packages\mongomock\collection.py:896, in Collection._update(self, spec, document, upsert, manipulate, multi, check_keys, hint, session, collation, let, array_filters, **kwargs)
    894     if not check_keys:
    895         _validate_data_fields(document)
--> 896     BSON.encode(document, check_keys=check_keys)
    897 existing_document.update(self._internalize_dict(document))
    898 if existing_document['_id'] != _id:

File ~\miniconda\envs\test\Lib\site-packages\bson\__init__.py:1429, in BSON.encode(cls, document, check_keys, codec_options)
   1403 @classmethod
   1404 def encode(
   1405     cls: Type[BSON],
   (...)
   1408     codec_options: CodecOptions[Any] = DEFAULT_CODEC_OPTIONS,
   1409 ) -> BSON:
   1410     """Encode a document to a new :class:`BSON` instance.
   1411
   1412     A document can be any mapping type (like :class:`dict`).
   (...)
   1427        Replaced `uuid_subtype` option with `codec_options`.
   1428     """
-> 1429     return cls(encode(document, check_keys, codec_options))

File ~\miniconda\envs\test\Lib\site-packages\bson\__init__.py:1050, in encode(document, check_keys, codec_options)
   1047 if not isinstance(codec_options, CodecOptions):
   1048     raise _CODEC_OPTIONS_TYPE_ERROR
-> 1050 return _dict_to_bson(document, check_keys, codec_options)

InvalidDocument: cannot encode object: True, of type: <class 'numpy.bool_'>
Andrew-S-Rosen commented 3 hours ago

Looks like a similar error occurred here: https://github.com/materialsproject/fireworks/issues/522.