oetiker / rrdtool-2.x

RRDtool 2.x - The Time Series Database
86 stars 8 forks source link

API abstraction of the storage backand #5

Open fooker opened 11 years ago

fooker commented 11 years ago

Would be nice if the new API has an abstract interface allowing different storage backends.

The abstraction interface should mirror the basic operations of the file descriptor API like open, close, read, write and sync. All algorithms in RRDv2 should use this API to read and write the data.

This would allow different backends like, files, mmap, memory buffers, network protocols, caching daemons, ...

oetiker commented 11 years ago

yes ... this is a primary reason for starting the 2.x project in the first place ... :-)

stamfest commented 11 years ago

Especially important: Mixing multiple backends in one operation.

oetiker commented 11 years ago

peter, please elaborate

stamfest commented 11 years ago

Eg. rrdtool graph should be able to fetch RRDs from different "RRD servers" (an rrdtool running in "piped" mode on a tcp socket). This would allow for truly massive, distributed RRD installations. I am operating such installations, and it would be VERY convenient. An URL like syntax for RRD files might be the way to go.

I actually started something like this quite a while ago, but rrdtool internals are ... sometimes ... a little arcane... :-)

I would be a little nervous to support non-RRD-file based backends: IMHO it would ultimatly lead to people wanting to have data in SQL databases be interpreted by rrdtool. That way lies madness!

oetiker commented 11 years ago

ah, you are talking about the graphing/fetching part I agree ... maybe reformulate your 'issue' ... btw, there is already a libdbi patch in rrdtool :)

fooker commented 11 years ago

@stamfest my idea of this is to abstract the I/O-API.

This will never touch the format of RRD data. The API should provide things like "read(int offset)" or "write(int offset, char byte)". This would not allow to fetch data from a database. But it would allow to read and write memory buffers or network sockets.

[And it would help me to implement an RRD BLOB data type for PostgreSQL - but that's another story.]

oetiker commented 11 years ago

well I think the rrd on disk format has to change ... both for performance and for portability reasons ... the portability part is pretty simple to realize I think, the performance part might warrant some extra thinking ... not quite clear on that yet ... in any event, optimizing on 8 byte chunks of data transfer certainly leaves room for improvement.

stamfest commented 11 years ago

Agreed: The on-disk format will have to change for portabillity (looking forward to it). An abstraction might be the way to go, but I doubt that very different storage backends make too much (practical) sense. Maybe it would be better to make a distinction between the roles the tools plays:

(1) data storage and aggregation (2) data fetching, processing and graphing.

For (2) a highly abstracted data access API might make sense. For (1) this seems less clear. The rrdtool format has the advantage of being small and of constant size. Also runtime is predictable. For many possible storage backends this is not the case, IMHO. The key here is data aggregation.

What I want to really say is: Don't make data storage and aggregation too complicated (and potentially slow) because of an abstraction.

oetiker commented 11 years ago

yes deffinitely not slow ... my vision re (1) is that the agreggation work can happen in memory so that syncing data to disk can potentially be delayed while still getting all the results from aggregation immediately ... something like this would make applications like rrdcached much simpler

doosmall commented 11 years ago

if we can update or fetch data from memory ,it will be more effectly.