Allow to write "lines" directly, not just Point objects.

aadrian commented 8 years ago

Please allow to write lines (strings) (as in the line protocol) directly, not just Point objects.

Especially when some of the data is coming from other systems already respecting the line protocol, it makes no sense to parse it, make Point objects so that the API makes them back strings and sends to the DB.

Thank you.

majst01 commented 8 years ago

Hi @aadrian

Im happy to accept PR Requests for that. You can start adding a new method here:

https://github.com/influxdb/influxdb-java/blob/master/src/main/java/org/influxdb/InfluxDB.java#L134

for example public void write(final String lineprotocol);

and then add the implementation into: https://github.com/influxdb/influxdb-java/blob/master/src/main/java/org/influxdb/impl/InfluxDBImpl.java#L171

Also add some Tests to verify your code works correct.

Greetings Stefan

andrewdodd commented 8 years ago

I question the value of this feature request.

@aadrian or @antjori or @slomek, could you supply a good example of where this functionality is really needed?

I just don't "get" it. In particular,

If you already have a system that creates data in the line protocol, why are you using this library?
Why not send the data directly from that system? OR
If you really need this library for some reason, why not change the existing system to NOT make line protocol strings and defer that responsibility to this library?

This library provides a pretty narrow set of features. It is pretty tightly bound to the influx DB and the features it supports; and currently I think the abstractions in the library are pretty good at doing their job.

Also, by letting the influx-java library do the translation to the line protocol you get an additional benefit: You can pick the version of the influx-java library that implements the same version of the line protocol that your influxDB is using.

As an aside, this request is basically allowing a side-door, effectively an "unsafe" mode. Before it is really considered I would like some thinking in the areas of:

Should this function validate the provided lines?
- Yes -> Well then you've just moved the 'need to parse' into the library. Also, do you have to supply the version of the line protocol you are using, so that the library knows how to do the parsing?
- No -> Then the library is just a pipe.
Should the lines provided in this way be batched? (if the batching is enabled)
- If not, then should they have priority over records that are being batched?
- If yes, how should the lines be stored in the batch processor? As strings? Or parsed and stored as Point objects?

aadrian commented 8 years ago

I question the value of this feature request.

If I'm pumping millions of data lines very fast (in a few seconds), than the "Point" object creation overhead is just too huge. Considering that data is coming from different systems, and some already have the data in the required line format, creating "Point" objects just for the OO's sake makes no sense IMHO.

At the moment this is not possible at all with the current API.

Then the library is just a pipe

What is a driver API (like the JDBC one) if not just a "smarter pipe" ?

andrewdodd commented 8 years ago

Ok, I guess this makes sense, but it feels like you have cherry-picked the parts of my enquiry you would like to answer and ignored the rest.

Why not just send your millions of lines of data from the existing system? You must have an implementation of the line protocol in there already (which you are going to have to keep up-to-date with any changes on the InfluxDB end). The API for InfluxDB is pretty basic?

And the questions about validation and batching?

Aside: I can't say I'm very well versed in JDBC technologies, but mostly they about providing Java-language abstractions to database concepts and providing a way to formalise access to different database products. This library doesn't really fit the bill here, as it ONLY talks to Influx. In many ways, this library is really more like an ORM for InfluxDB than a JDBC driver.

majst01 commented 8 years ago

Hi @aadrian can you give some numbers how big the impact on object creation is in you use case. I ask because especially object creation is a very fast operation in the jdk nowadays.

aadrian commented 8 years ago

@majst01 The impact of creating Point objects instead of just writing the lines directly with some Java http api is ~10 bigger.

Reading is worse, as the influxdb Java API can't seem to do streaming.

andrewdodd commented 8 years ago

@aadrian Does that include the HTTP access? i.e. are you using the

public void write(final BatchPoints batchPoints);

interface?

aadrian commented 8 years ago

@andrewdodd we are using Batching if the data comes as batch, otherwise we pump it further right away.

andrewdodd commented 8 years ago

@aadrian, I'm not really sure I completely understand your situation.

You have a system that produces 'write points' in the line protocol format
You send these to another system, which is using the influx-java library to send them to the actual InfluxDB
Sometimes these 'write points' are provided as a batched write, sometimes they are single points; and you use the write(BatchPoints); function or the write(Point); function accordingly
You are experiencing slow writes (I'm guessing this is with the write(Point); function if you get many in a row quickly)

Is this a correct summary?

majst01 commented 8 years ago

This might be the root cause of slow writes ! I cant believe that writing a simple line from a Object with StringBuilder is 10times slower than directly writing a String.

andrewdodd commented 8 years ago

I wouldn't be so sure @majst01. Sounds like the HTTP request might be involved here.

majst01 commented 8 years ago

But sending the line protocol directly http requests are involved in the same amount.

andrewdodd commented 8 years ago

What I mean is... we have not confirmed that the 10x slow down of "single writes" is really due to parsing/serialising the forwarded points. I believe the major factor in slowness here will be the network, not the parsing/re-serialising of the points.

I think it is highly likely that @aadrian is:

Not using the batch processor
Parsing single writes and then calling write(Point);, which results in 1 HTTP request for 1 Point written
Parsing multiple writes and then calling write(BatchPoints);, which results in 1 HTTP request for N Points written

I believe the slowness is probably due to the ratio of points to HTTP requests. I'm happy to be proven wrong, but I'd like to see a test addressing just this issue (i.e. parse->Point->serialise->write vs string->write) in the context of usual network latencies (i.e. it is probably 10x slower to do the parsing etc, but not in the context of an HTTP call).

@aadrian , if you are still with us:

Is the summary a few comments above correct?
Can you confirm the dot points in this comment are true or false?

Can you answer any of the earlier questions:

Why not send the points directly from the original system OR why not get the forwarding system to do the line protocol encoding instead?
Should there be validation on this 'raw string' write method?
Should this 'raw string' write method bypass the batch processor if it is enabled?

slomek commented 8 years ago

@majst01 If you still find this PR valuable, you can merge it as I fixed the issues you mentioned. If you don't think it's worthy, please close both this PR and issue and call it a day.

majst01 commented 8 years ago

I have merged this PR, based on the discussion for some people it seems useful, the implementation does not have any side effects to the existing code.

andrewdodd commented 8 years ago

Sorry I created such a fuss.

influxdata / influxdb-java

Allow to write "lines" directly, not just Point objects. #126