sakrejda / protostan

Thin protobuf interface wrapper for Stan
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Replace StanMessage with more specific StanOutput #52

Closed ariddell closed 8 years ago

ariddell commented 8 years ago

StanOutput (I'm open to a different name) is a union type which wraps messages which might be delivered from a Stan callback interface writer.

If there does arise a case where one needs a truly generic StanMessage, I propose we create such a thing at that point. Until then, I think the maxim of "do the simplest thing that could possibly work" (DTSTTCPW) is a good one.

The code tests aren't passing, I think. For now this is just a marker (with code) for an agenda item.

sakrejda commented 8 years ago

I agree with the principle, but StanMessage only exists for serialization. For serialization it makes sense to be able to serialize any kind of message. As you read from a file you need to be able to have a tag (in this case the "type" field) to check for what the internals of the message are like so that you can query only fields which exist and are set. This is not relevant for RPC or any other context where messages are dealt with one at a time with a known type.

Ideally you can serialize all the IO from a Stan run and use that to make it possible to reproduce the run, restart a run from the end state of the chain to continue it, etc... with that goal in mind it makes sense to be able to serialize all IO to a single file type. I could drop that layer in serialization and keep an external type tag the way that I keep an external size tag for each message, but at that point the serialization is incompatible with how protobuf messages are generally serialized in, e.g., Java, or the C++ pull request.

ariddell commented 8 years ago

Serialization is the responsibility of the interface, not protostan. PyStan, CmdStan, and RStan have no use for serialization but could benefit from protostan.

I don't think RPC would need serialization either -- except in the context of specific RPC calls.

ariddell commented 8 years ago

If you feel strongly about this can I convince you to split this repository into protostan and protostan-lite where protostan-lite really only has the bare minimum needed to use protocol buffers to talk to the Stan C++ API?

sakrejda commented 8 years ago

I'm on board with splitting out serialization when there is a place for it, for example when I get the cmdstan fork going. I can do that next if you are working on incorporating this as a library into pystan and it helps to have this repo be lighter. You're right that's the right way to go.

I still don't want to change the message type in the writer because the goal of this message type is to allow arbitrary serialization, and the writer also allows for generic serialization.

It would help me with not getting in your way if I understood your use case better. Could you just go ahead and write code for some of the pystan use cases that you need protostan for? I need to see the vision here or I'll continue designing something that gets in your way...

If you want a different message type for something could you do an issue and (if you're at that point) add a pull request to match it? On Mar 9, 2016 3:10 PM, "Allen Riddell" notifications@github.com wrote:

If you feel strongly about this can I convince you to split this repository into protostan and protostan-lite where protostan-lite really only has the bare minimum needed to use protocol buffers to talk to the Stan C++ API?

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/pull/52#issuecomment-194485658.

ariddell commented 8 years ago

What about stan-proto as a name for the thin library?

My use case is purely PyStan 3. I want to use protocol buffers to talk to the Stan C++ API in a rewrite of PyStan.

My sense is that you want protostan to be a cmdstan replacement, which is fine and I'm eager to help. I just need a way of referring to the code that PyStan 3 and protostan(-cmdstan) share without having to create a fork.

ariddell commented 8 years ago

Maybe I'm mistaken here. Does Stan C++ ever use the writer for other forms of serialization? As far as I can tell there's an extremely limited set of things it ever communicates to the interfaces with the writer. If you have an example of Stan doing something really generic with the writer and this being consumed by RStan or PyStan then I'll shut up about this.

Could we compromise and figure out empirically what things Stan sends to writers and make that the message format? Something like StanWriterOutput or StanWriterMessage?

sakrejda commented 8 years ago

I'm not sure what you mean by this, could you be more explicit? On Mar 9, 2016 5:59 PM, "Allen Riddell" notifications@github.com wrote:

Maybe I'm mistaken here. Does Stan C++ ever use the writer for other forms of serialization? As far as I can tell there's an extremely limited set of things it ever communicates to the interfaces with the writer. If you have an example of Stan doing something really generic with the writer and this being consumed by RStan or PyStan then I'll shut up about this.

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/pull/52#issuecomment-194556195.

ariddell commented 8 years ago

Sure, have a look at how

#include <stan/interface_callbacks/writer/base_writer.hpp>
#include <stan/interface_callbacks/writer/stream_writer.hpp>

are used in RStan and PyStan (the file is stan_fit.hpp in both repositories).

There are only three things that get written via the writers: doubles, integers, and strings. The proto buf message type should reflect this. Anything else (i.e., potentially also wrapping StanCompileRequest) is a departure from a minimal wrapping of the Stan C++ API.

I think the .proto files are 95% right; but the remaining details are important to get right, I think.

On 03/09, Krzysztof Sakrejda wrote:

I'm not sure what you mean by this, could you be more explicit? On Mar 9, 2016 5:59 PM, "Allen Riddell" notifications@github.com wrote:

Maybe I'm mistaken here. Does Stan C++ ever use the writer for other forms of serialization? As far as I can tell there's an extremely limited set of things it ever communicates to the interfaces with the writer. If you have an example of Stan doing something really generic with the writer and this being consumed by RStan or PyStan then I'll shut up about this.

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/pull/52#issuecomment-194556195.


Reply to this email directly or view it on GitHub: https://github.com/sakrejda/protostan/pull/52#issuecomment-194596472

sakrejda commented 8 years ago

So how would pystan read what a writer writes like this? On Mar 10, 2016 8:05 AM, "Allen Riddell" notifications@github.com wrote:

Sure, have a look at how

#include <stan/interface_callbacks/writer/base_writer.hpp>
#include <stan/interface_callbacks/writer/stream_writer.hpp>

are used in RStan and PyStan (the file is stan_fit.hpp in both repositories).

There are only three things that get written via the writers: doubles, integers, and strings. The proto buf message type should reflect this. Anything else (i.e., potentially also wrapping StanCompileRequest) is a departure from a minimal wrapping of the Stan C++ API.

I think the .proto files are 95% right; but the remaining details are important to get right, I think.

On 03/09, Krzysztof Sakrejda wrote:

I'm not sure what you mean by this, could you be more explicit? On Mar 9, 2016 5:59 PM, "Allen Riddell" notifications@github.com wrote:

Maybe I'm mistaken here. Does Stan C++ ever use the writer for other forms of serialization? As far as I can tell there's an extremely limited set of things it ever communicates to the interfaces with the writer. If you have an example of Stan doing something really generic with the writer and this being consumed by RStan or PyStan then I'll shut up about this.

— Reply to this email directly or view it on GitHub <https://github.com/sakrejda/protostan/pull/52#issuecomment-194556195 .


Reply to this email directly or view it on GitHub: https://github.com/sakrejda/protostan/pull/52#issuecomment-194596472

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/pull/52#issuecomment-194833722.

sakrejda commented 8 years ago

Sorry being accidentally obtuse here. I wrote the writer with the intent of dealing with serialization to file/stream of general Stan messages. The nature of the writer design on this stab-dev/stan side is that a writer is going to be purpose-specific. So for example if I limit the current write to one or a few message types, I will no longer be able to use it for what I'm interested in---generic serialization. If there's a different writer that would be important for PyStan/etc... then I think we should add/test it here, and split it out when the project matures more. In practice I think that PyStan won't need a delimited protobuf writer at all, you should be able to handle individual messages with a type tag. I'm happy to help you make that happen so that a protobuf-based PyStan can happen faster, but I think it's a separate goal than the current writer. The writers are very modular and should be very easy to split out in the future so I don't think they're really going to become a bloat problem.

ariddell commented 8 years ago

I think you're right. There's no problem here with having separate libraries. I'm just hoping that we can have the .proto files be as closely aligned as possible.

I'll fork the current repository and start a different library which is better aligned with PyStan's needs. I was thinking 'mcproto' wouldn't be a bad (provisional) name.

Hopefully at some point we can factor out code that mcproto and protostan share into a common library.

On 03/10, Krzysztof Sakrejda wrote:

Sorry being accidentally obtuse here. I wrote the writer with the intent of dealing with serialization to file/stream of general Stan messages. The nature of the writer design on this stab-dev/stan side is that a writer is going to be purpose-specific. So for example if I limit the current write to one or a few message types, I will no longer be able to use it for what I'm interested in---generic serialization. If there's a different writer that would be important for PyStan/etc... then I think we should add/test it here, and split it out when the project matures more. In practice I think that PyStan won't need a delimited protobuf writer at all, you should be able to handle individual messages with a type tag. I'm happy to help you make that happen so that a protobuf-based PyStan can happen faster, but I think it's a separate goal than the current writer. The writers are very modular and should be very easy to split out in the future so I don't think they're really going to b ecome a bloat problem.


Reply to this email directly or view it on GitHub: https://github.com/sakrejda/protostan/pull/52#issuecomment-194864387

sakrejda commented 8 years ago

I'm arguing for tolerating the bloat temporarily rather than splitting efforts into two repos. I really don't understand wanting to split it at this point---I agree that longer-term we want a lean repo that is shared.

ariddell commented 8 years ago

I think a zero-tolerance policy on code/feature bloat is the right way to go based on past Stan-specific experience. Let's see how things go with two separate repositories for a bit. Hopefully it will be trivial to bring them back together or split out common code at some point. Naturally we can just go git cherry-pick any commits which are of mutual interest.

On 03/10, Krzysztof Sakrejda wrote:

I'm arguing for tolerating the bloat temporarily rather than splitting efforts into two repos. I really don't understand wanting to split it at this point---I agree that longer-term we want a lean repo that is shared.


Reply to this email directly or view it on GitHub: https://github.com/sakrejda/protostan/pull/52#issuecomment-194872232

sakrejda commented 8 years ago

OK.

ariddell commented 8 years ago

The repository is here https://github.com/ariddell/mcproto

I'll rename stuff so the namespace doesn't conflict but otherwise the only significant difference will be a variant of this patch.