Open rkingsbury opened 3 years ago
FYI @htz1992213 @orioncohen
Per a side discussion with @mkhorton , we might want to consider adding DataFrame
support to monty
so that we don't have to implement custom as_dict
/ from_dict
. If we do that, just inheriting from MSONable
and making sure that all kwargs are stored as class attributes should be enough. But currently MSONable
doesn't support serializing DataFrame
. We can discuss. See https://github.com/materialsproject/pymatgen/issues/2138
Thanks for bringing this up @rkingsbury. If I am reading this correctly, the method is throwing an error when you call data = loadfn('some_path.json')
, right?
What would adding DataFrame
support to monty
involve? Would that be as easy as importing a specialized package?
I support using monty to serialize without using custom as_dict
and from_dict
methods. My one concern is that serialization is not very persistent if we change the underlying object code. If we want to be able to store permanently in MongoDB, it might be wise to create a more flexible schema for storage.
@rkingsbury Will make a patch overriding superclass method. I might set some method, e.g. from_ff_and_topologies
, to be unsupported.
Thanks for bringing this up @rkingsbury. If I am reading this correctly, the method is throwing an error when you call
data = loadfn('some_path.json')
, right? What would addingDataFrame
support tomonty
involve? Would that be as easy as importing a specialized package?
I think what would be required is to add a type check to MontyEncoder
, similar to what is currently done for datetime objects (here).
I'm a little out of my depth here, but from #2138 I know that this works:
df.to_json(default_handler=MontyEncoder().encode)
and loadfn
uses MontyEncoder
by default (as I understand it), so I think we would just need to add something to loadfn
and dumpfn
like:
if isinstance(o, pd.DataFrame):
return o.to_json(default_handler=MontyEncoder().encode)
@mkhorton does that look right to you?
Hah, yes! I actually typed out a reply to this saying exactly that:
It would mean adding
pandas
to this block and then encoding the data frame usingdf.to_json(default_handler=MontyEncoder().encode)
. In this way, any data frame can be serialized automatically provided it contains only native or MSON data types, could be stored in Mongo, etc.
but forgot to submit it 🤦🏻♂️
Note that this is slightly different to your proposal just in that the code should be in MontyEncoder
/MontyDecoder
, rather than needing to touch loadfn
or dumpfn
.
Thanks @mkhorton ! So something like
if modname and modname not in ["bson.objectid", "numpy"]:
if modname == "pandas" and classname == "DataFrame":
return= d.to_json(default_handler=MontyEncoder().encode)
if modname == "pandas" and classname == "Series":
return= d.to_json(default_handler=MontyEncoder().encode)
?
Yes, I believe so, but I haven't tried -- see here too. Note that including the @module
and @class
is important, and then the pandas json could be under a data
key.
It's not clear to me if Series
is necessary, or if additional pandas data types would also be useful, but sounds reasonable to me.
@rkingsbury @mkhorton @orioncohen I have made a PR (https://github.com/materialsproject/pymatgen/pull/2191) for serialization using conventional encoding, but please let me know if the newer way works!
@rkingsbury this issue can be closed, right?
@rkingsbury this issue can be closed, right?
I think so? It looks like #2191 should have taken care of it, although I have not personally tried serializing the returned dict
Describe the bug I am unable to serialize and deserialize a
CombinedData
object usingloadfn
/dumpfn
. I believe this occurs becauseCombinedData
inheritsfrom_dict
fromLammpsData
, butCombinedData.init()
takes different kwargs.To Reproduce
Create a
CombinedData
object calleddata
will fail with
Expected behavior The
CombinedData
object should be created from the fileDesktop (please complete the following information):