sakrejda / protostan

Thin protobuf interface wrapper for Stan
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Add a binaryProtubufFileWriter #43

Closed sakrejda closed 8 years ago

sakrejda commented 8 years ago
ariddell commented 8 years ago

Who's the target audience for this (i.e., the file writer)? Is it just for testing?

sakrejda commented 8 years ago

It's an easy split do do, matches the available implementations from protobuf (OstreamOutputStream and FileOutputStream), and it will make it dead simple to integrate this into a full CmdStan, so I guess the target audience is a fork/hack on CmdStan to demonstrate/tune it as an easy binary output format.

ariddell commented 8 years ago

Is there an argument in favor of this being cmdstan's responsibility? I mean, would servestan use this?

(You can see I'm zealot for avoiding any code bloat.)

sakrejda commented 8 years ago

Sure there could be, but what I want is a demo to show that a) having protobuf as a dependency is not going to cause a big headache for CmdStan; and b) we can produce a good binary format through this route.

If there's going to be a binary format I'd like to have a go at it being protobuf-based so that it's uniform across interfaces so I'm willing to put in a little extra work to have a demo for it. Ultimately I guess ServeStan (Andrew/Daniel have been calling this idea CloudStan, which is fun) could use this if there's going to be any caching of results/models/etc.

I agree about avoiding code bloat but I don't think this is it. Also, until we make a commitment to support sub-projects (interfaces) we have free reign to lop off things that don't work out!

On Fri, Mar 4, 2016 at 9:24 AM, Allen Riddell notifications@github.com wrote:

Is there an argument in favor of this being cmdstan's responsibility? I mean, would servestan use this?

(You can see I'm zealot for avoiding any code bloat.)

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/issues/43#issuecomment-192300098.

ariddell commented 8 years ago

What about putting that in something else, like cmdstan-protostan?

I'm just thinking here about eventually getting PyStan 3 running on this and I'd love for it to be as clean and focused as possible.

ariddell commented 8 years ago

I'll note that https://github.com/ariddell/cmdstan-protostan is basically working. It would be easy to add the file writer there and would be easier, I think, for others to grasp the demo.

I feel like our next step should be to have some data that's worth writing! i.e., get a model up and running and producing samples, streamed over the deliminted pb writer.

sakrejda commented 8 years ago

On Fri, Mar 4, 2016 at 9:47 AM, Allen Riddell notifications@github.com wrote:

I'll note that https://github.com/ariddell/cmdstan-protostan is basically working. It would be easy to add the file writer there and would be easier, I think, for others to grasp the demo.

I feel like our next step should be to have some data that's worth writing! i.e., get a model up and running and producing samples, streamed over the deliminted pb writer.

I was going to take a very different approach to this---I would like to take CmdStan wholesale and surgically replace the relevant pieces. At first I would just replace the writer, and expose a complementary reader in R or Python using their protobuf library and my read_delimited_pb function. What do you think?

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/issues/43#issuecomment-192307546.

ariddell commented 8 years ago

This sounds fine but I think it would be great to separate out these things into separate projects. I always thought there would be a library (called "protostan" but perhaps something else) which would just be the set of .proto files and a tiny set of cpp files which would be usable (in theory) by RStan, PyStan, ServeStan, and CmdStan.

For example, some features that are useful to cmdstan may not be useful to RStan, PyStan, and ServeStan. Parsing command-line arguments, for example. Or file input/output. Perhaps these could go in a separate project that has Stan and protostan (or "protostan-lib", if you want to rename it) as dependencies?

sakrejda commented 8 years ago

On Fri, Mar 4, 2016 at 10:15 AM, Allen Riddell notifications@github.com wrote:

This sounds fine but I think it would be great to separate out these things into separate projects. I always thought there would be a library (called "protostan" but perhaps something else) which would just be the set of .proto files and a tiny set of cpp files which would be usable (in theory) by RStan, PyStan, ServeStan, and CmdStan.

I see, so maybe the read/write_delimited_pb and associated tests but not the writer.

For example, some features that are useful to cmdstan may not be useful to RStan, PyStan, and ServeStan. Parsing command-line arguments, for example. Or file input/output. Perhaps these could go in a separate project that has Stan and protostan (or "protostan-lib", if you want to rename it) as dependencies?

I see, I agree with this. I guess the base class and the stream writer implementation can go into the current repo and I can see putting the file-specific sub-class into a CmdStan fork.

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/issues/43#issuecomment-192317482.

ariddell commented 8 years ago

It sounds like we're on the same page.

re: the writer -- I think everyone would need the pb_delimited writer --- or something very much like it, right? I could have this wrong. I'll know more as I read through the new code.

sakrejda commented 8 years ago

On Fri, Mar 4, 2016 at 10:45 AM, Allen Riddell notifications@github.com wrote:

It sounds like we're on the same page.

re: the writer -- I think everyone would need the pb_delimited writer --- or something very much like it, right? I could have this wrong. I'll know more as I read through the new code.

I think this is right---everyone would use a class like stan::interface_callbacks::writer::binary_proto_stream_writer, and everyone will want the underlying read/write_delimited_pb utility functions.

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/issues/43#issuecomment-192328536.

sakrejda commented 8 years ago

I ripped out all the explicit calls to a write_delimited_pb and replaced them with a function pointer template argument so the same class can now do multiple stream types (anything subclassed from ZeroCopyStream) and it'll be easy to write a write_jsonlines_pb stand-alone function to pass in as a template argument when protobuf comes around to implementing that feature. Feels a little over-engineered but it's much less code this way, I wrote new test for OstreamOutputStream and FileOutputStream and it all works nicely... open to thoughts about the data format/overengineered aspect but I'm calling this issue done.

ariddell commented 8 years ago

OK. I think this looks good. Looking forward to working through it (and any docs #42).

sakrejda commented 8 years ago

Added the docs this morning, moved cleanup to a separate issue because [soccer analogy]. K

On Mon, Mar 7, 2016 at 8:48 AM, Allen Riddell notifications@github.com wrote:

OK. I think this looks good. Looking forward to working through it (and any docs #42 https://github.com/sakrejda/protostan/issues/42).

— Reply to this email directly or view it on GitHub https://github.com/sakrejda/protostan/issues/43#issuecomment-193253820.