Closed masonk closed 3 years ago
I like the idea, and I appreciate asking for direction before submitting a pull request. I have a few questions; please don't be scared off. :)
In response to your questions:
src/io.rs
sounds good to me.Would this introduce additional dependencies? If this requires a protobuf implementation, we would want to be careful to choose the right crate. Likewise if there are any file format dependencies.
Does this include something like an Example
struct? What would that look like? We currently can't express something like Map<String, Tensor>
because Tensor
is generic in the element type, so you couldn't have a map containing both int and float tensors, for example. We could expose something similar to the internal AnyTensor
trait, and then store a Map<String, Box<AnyTensor>>
.
@jhseu Do you know if record/example IO will ever be provided via the C API?
Yes, RecordWriter is pure Rust. Its deps are crc32c
and byteorder
. I have no qualms with replacing the code with externs to C, of course. I'm pretty sure the binary format of a TFRecord is stable at this point, so I think the best reason to do it would be perf.
As for creating Examples, mine are serialized using the protobuf
crate. I haven't spent any time wrapping the raw protobuf gencode API . I will put some put thought into an Example API, as there are a few layers of boilerplate which kind of clutter things. And, yes, this way would entail a protobuf
dep. I think it would make sense to make this an optional dep.
Regardless of whether we wrap the gencode API, I think that example.proto
and feature.proto
are so stable that our best option is to check in the gencode to avoid requiring a build step (and, by the way, a dependency on protoc
). We'd only have to update the repo with new gencode whenever the main repository updates those protobufs which, I wager, will happen rarely. There's a lot of data sitting on disks that is in this format, so I wouldn't be surprised if it never changes.
We won't have a C API for RecordWriter. The (uncompressed) implementation is simple enough that there isn't really a need. We already have a separate Java implementation in the ecosystem repository.
I would suggest that you design it so that it's orthogonal to tf.Examples. From a TensorFlow perspective, we support reading any arbitrary strings out of RecordWriter, so the interface should just be about reading and writing bytes.
That sounds good.
@masonk Would you might sending an initial pull request just for the RecordWriter with support for writing strings, leaving out Example support? Once that's sorted we can hash out the details for Examples.
Sounds good. I think I'll get to it this weekend.
Would you like ExampleWriter, too?
Yes, but I'd like a little more detail first so we can make sure we're on the same page. Which crate(s) would you add? What exactly would the public API be?
I haven't thought it through yet, but before I do any coding, I'll write up a small design doc and post it --- here? Or another issue?
On Sun, May 20, 2018 at 7:54 PM, Adam Crume notifications@github.com wrote:
Yes, but I'd like a little more detail first so we can make sure we're on the same page. Which crate(s) would you add? What exactly would the public API be?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/rust/issues/162#issuecomment-390539923, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbdznHhQ0APcyCLuYFtT1QpgeXmyW9-ks5t0iyCgaJpZM4T8pd4 .
Sounds good. Here's fine.
One question, do we have APIs in Rust to read TFRecords? 🤔
I never submitted a record reader, and I don't see one in the repo right now.
It would be good to have one, though.
I revisited this today to remember why I never submitted a reader. Here's what I remember:
Pull request for a basic reader
Interesting, thanks for the fast response, that is a lot of info to take in.
Perhaps just another question so far. The first case I am using Rust to read TFRecord is probably on the serving side. As we will have some features ready in places like BigTable and when providing a user id on the backend we will get a single feature row for that user, in TRRecord formmat.
I am thinking about using Rust to build a backend to host the model and take in the TFRecord (one row) and return a model prediction to the caller.
I think with your PR #240 this can also be achieved? :)
Yes, it should be plenty for that.
In my own project, I am using Rust to write TFRecords full of TFExamples to disk.
The code for doing this is something that might be useful to many projects that import the
tensorflow
crate, and I'd like to submit a PR for these utils.However, it would be nice to have an agreement from package maintainers on the specifics of the design before I code up the CL. To give an example of outstanding questions:
RecordWriter
and/orExampleWriter
in this crate?tensorflow::io
?src/io.rs
?