Open jobergum opened 3 years ago
should we have a sample docproc that transforms from a binary field to a tensor field?
We do have an undocumented tool 'vespa-feed-perf' for simple file based usage. It can take a .json or .xml and generate serialized binary documents using our undocumented binary format. You can then compress this file and transfer it. You can then use the same vespa-feed-perf tool and feed it to vespa. This is what is done in some of the performance tests to reduce the amount of data. If you are using the httpclient I guess it can use gzip compression to reduce network cost.
I think the main pain point is storage and the cost of serialization and deserialization including compressing it. To feed from grid I need to convert to json, then transfer it over the wire through vespa http client, then it's deserialized and then converted to vespa binary protocol.
Feeding documents with large tensor fields (e.g tensor(p{},dt{},x[128})) using JSON or XML(deprecated) serialization is cumbersome as string representation of float/double is costing a lot of network bandwidth, storage and processing (serialize, deserialize).