Closed aadrian closed 8 years ago
Hi @aadrian
Im happy to accept PR Requests for that. You can start adding a new method here:
https://github.com/influxdb/influxdb-java/blob/master/src/main/java/org/influxdb/InfluxDB.java#L134
for example public void write(final String lineprotocol);
and then add the implementation into: https://github.com/influxdb/influxdb-java/blob/master/src/main/java/org/influxdb/impl/InfluxDBImpl.java#L171
Also add some Tests to verify your code works correct.
Greetings Stefan
I question the value of this feature request.
@aadrian or @antjori or @slomek, could you supply a good example of where this functionality is really needed?
I just don't "get" it. In particular,
This library provides a pretty narrow set of features. It is pretty tightly bound to the influx DB and the features it supports; and currently I think the abstractions in the library are pretty good at doing their job.
Also, by letting the influx-java
library do the translation to the line protocol you get an additional benefit: You can pick the version of the influx-java
library that implements the same version of the line protocol that your influxDB is using.
As an aside, this request is basically allowing a side-door, effectively an "unsafe" mode. Before it is really considered I would like some thinking in the areas of:
I question the value of this feature request.
If I'm pumping millions of data lines very fast (in a few seconds), than the "Point" object creation overhead is just too huge. Considering that data is coming from different systems, and some already have the data in the required line format, creating "Point" objects just for the OO's sake makes no sense IMHO.
At the moment this is not possible at all with the current API.
Then the library is just a pipe
What is a driver API (like the JDBC one) if not just a "smarter pipe" ?
Ok, I guess this makes sense, but it feels like you have cherry-picked the parts of my enquiry you would like to answer and ignored the rest.
Why not just send your millions of lines of data from the existing system? You must have an implementation of the line protocol in there already (which you are going to have to keep up-to-date with any changes on the InfluxDB end). The API for InfluxDB is pretty basic?
And the questions about validation and batching?
Aside: I can't say I'm very well versed in JDBC technologies, but mostly they about providing Java-language abstractions to database concepts and providing a way to formalise access to different database products. This library doesn't really fit the bill here, as it ONLY talks to Influx. In many ways, this library is really more like an ORM for InfluxDB than a JDBC driver.
Hi @aadrian can you give some numbers how big the impact on object creation is in you use case. I ask because especially object creation is a very fast operation in the jdk nowadays.
@majst01 The impact of creating Point objects instead of just writing the lines directly with some Java http api is ~10 bigger.
Reading is worse, as the influxdb Java API can't seem to do streaming.
@aadrian Does that include the HTTP access? i.e. are you using the
public void write(final BatchPoints batchPoints);
interface?
@andrewdodd we are using Batching if the data comes as batch, otherwise we pump it further right away.
@aadrian, I'm not really sure I completely understand your situation.
influx-java
library to send them to the actual InfluxDBwrite(BatchPoints);
function or the write(Point);
function accordinglywrite(Point);
function if you get many in a row quickly)Is this a correct summary?
This might be the root cause of slow writes ! I cant believe that writing a simple line from a Object with StringBuilder is 10times slower than directly writing a String.
I wouldn't be so sure @majst01. Sounds like the HTTP request might be involved here.
But sending the line protocol directly http requests are involved in the same amount.
What I mean is... we have not confirmed that the 10x slow down of "single writes" is really due to parsing/serialising the forwarded points. I believe the major factor in slowness here will be the network, not the parsing/re-serialising of the points.
I think it is highly likely that @aadrian is:
write(Point);
, which results in 1 HTTP request for 1 Point writtenwrite(BatchPoints);
, which results in 1 HTTP request for N Points writtenI believe the slowness is probably due to the ratio of points to HTTP requests. I'm happy to be proven wrong, but I'd like to see a test addressing just this issue (i.e. parse->Point->serialise->write vs string->write) in the context of usual network latencies (i.e. it is probably 10x slower to do the parsing etc, but not in the context of an HTTP call).
@aadrian , if you are still with us:
Can you answer any of the earlier questions:
@majst01 If you still find this PR valuable, you can merge it as I fixed the issues you mentioned. If you don't think it's worthy, please close both this PR and issue and call it a day.
I have merged this PR, based on the discussion for some people it seems useful, the implementation does not have any side effects to the existing code.
Sorry I created such a fuss.
Please allow to write lines (strings) (as in the line protocol) directly, not just Point objects.
Especially when some of the data is coming from other systems already respecting the line protocol, it makes no sense to parse it, make Point objects so that the API makes them back strings and sends to the DB.
Thank you.