Closed jmschrei closed 9 years ago
Not sure JSON makes sense as the internal format (although proper Python objects aren't all that much smaller), but it could be a better serialization format than what we have now.
I meant as a serialization format, my mistake.
The Python objects we use already, probably. If we restrict ourselves to the types that the Python json module already knows how to serialize/deserialize, everything will have to be dicts and lists and other basic types, and we'll have to do silly things like monkey-patch HMM methods onto them.
At first glance it seems like we would want to implement the deserialization hooks described in < https://docs.python.org/2/library/json.html#json-to-py-table> and the serialization dispatch described in < https://docs.python.org/2/library/json.html#json.JSONEncoder.default> so we can save and load States and so on in JSON through the json module.
There are problems with the extensibility of this approach to new user-defined types though, because they would need their own serialization/deserailization hooks.
How do you see a json in-memory representation working?
On Mon, Sep 15, 2014 at 4:58 PM, Jacob Schreiber notifications@github.com wrote:
What do you think would be a good internal format?
— Reply to this email directly or view it on GitHub https://github.com/jmschrei/yahmm/issues/33#issuecomment-55677455.
Consider YAML, since JSON is a subset of YAML now.
On Tue, Sep 16, 2014 at 9:08 AM, adamnovak notifications@github.com wrote:
The Python objects we use already, probably. If we restrict ourselves to the types that the Python json module already knows how to serialize/deserialize, everything will have to be dicts and lists and other basic types, and we'll have to do silly things like monkey-patch HMM methods onto them.
At first glance it seems like we would want to implement the deserialization hooks described in < https://docs.python.org/2/library/json.html#json-to-py-table> and the serialization dispatch described in < https://docs.python.org/2/library/json.html#json.JSONEncoder.default> so we can save and load States and so on in JSON through the json module.
There are problems with the extensibility of this approach to new user-defined types though, because they would need their own serialization/deserailization hooks.
How do you see a json in-memory representation working?
On Mon, Sep 15, 2014 at 4:58 PM, Jacob Schreiber notifications@github.com
wrote:
What do you think would be a good internal format?
— Reply to this email directly or view it on GitHub https://github.com/jmschrei/yahmm/issues/33#issuecomment-55677455.
— Reply to this email directly or view it on GitHub https://github.com/jmschrei/yahmm/issues/33#issuecomment-55704250.
With best regards, Y.Y.
I didn't mean we'd have JSON in-memory representation, we'd use Python objects. I meant considering that when we write a model out to a file, it's written out as a JSON as opposed to the format we use now. This could allow us to easily add or remove attributes without changing the reading and writing functions, if they were written correctly.
What advantages would YAML give that JSON would not? I don't know that much about it.
That sounds a lot better to me. We still do have to solve this problem of deserializing user-created distributions that may or may not be in currently loaded modules. I think we should poke around inside Pickle, and see how it manages to load the right module for things even when people use features like "import numpy as np".
On Tue, Sep 16, 2014 at 4:41 PM, Jacob Schreiber notifications@github.com wrote:
I didn't mean we'd have JSON in-memory representation, we'd use Python objects. I meant considering that when we write a model out to a file, it's written out as a JSON as opposed to the format we use now. This could allow us to easily add or remove attributes without changing the reading and writing functions, if they were written correctly.
What advantages would YAML give that JSON would not? I don't know that much about it.
— Reply to this email directly or view it on GitHub https://github.com/jmschrei/yahmm/issues/33#issuecomment-55830166.
Are you suggesting that make it so that people can custom define distributions, write their models out, and have people without the code for that distribution still be able to use the model? I'm not sure that's possible with just a JSON format. Maybe we could provide two options, one which is pickle-like but only machine-readable, and one that is human readable for the distributions which already have support.
No, I don't think we'll ever be able to really save the code for the distributions. But the use case I'm thinking of is more like this:
In this case, loading the HMM ought to import the module; in fact, I don't really think we can see what the caller has imported, so we might need to import the module ourselves even if the script that wants to load the HMM has already done it.
It gets a little trickier if instead of a system-installed module, the place where the code lives is somewhere in the filesystem, like in MyDistributions.py next to the scripts. I don't know how well Pickle's logic handles that case.
On Thu, Sep 18, 2014 at 9:56 AM, Jacob Schreiber notifications@github.com wrote:
Are you suggesting that make it so that people can custom define distributions, write their models out, and have people without the code for that distribution still be able to use the model? I'm not sure that's possible with just a JSON format. Maybe we could provide two options, one which is pickle-like but only machine-readable, and one that is human readable for the distributions which already have support.
— Reply to this email directly or view it on GitHub https://github.com/jmschrei/yahmm/issues/33#issuecomment-56069129.
I'm not sure if it's worth the extra effort when the user can simply write from MyDistributions import *
to solve the problem. I'll take a look at what pickle does and see how difficult it would be.
That's the thing; I'm not sure if "from MyDistributions import *" is going to help if it's only in the caller's namespace, and not in yahmm. I don't really know much about serialization though.
On Thu, Sep 18, 2014 at 10:16 AM, Jacob Schreiber notifications@github.com wrote:
I'm not sure if it's worth the extra effort when the user can simply write from MyDistributions import * to solve the problem. I'll take a look at what pickle does and see how difficult it would be.
— Reply to this email directly or view it on GitHub https://github.com/jmschrei/yahmm/issues/33#issuecomment-56071959.
It'd depend on how we implemented it. If the name
attribute of the object was the same as the class name as it is now, you can just eval
, or one of its safe bretherin. That would have to be specified, though.
This has been merged into pomegranate, as will all future changes.
I think it would be better if we change the underlying representation of all the objects to be just a JSON. For example, a distribution might be something like this:
and a state might look something like this:
If we use the default JSON parser, then we can add more parameters at will without changing the read or write functions at all. The only problem is that it means that the text file is less readable.