Closed xmlking closed 7 years ago
@xmlking, I don't directly have a personal interest in NiFi (I'm honestly waiting for the stream-processing-framework wars to shake out a bit before committing to any), but I'd be very interested in work that allowed us to package Maxwell as a library for external consumption, which would allow for a maxwell-nifi
project/package that relies on maxwell-core
-- ie the work of abstracting out things into reasonably generic interfaces enough so that it's easy to build maxwell into various frameworks.
@wushujames has, along similar lines, been working on something where he packages maxwell's brain into a kafka-connect library... mabybe there's some overlap there?
@xmlking, when you say Apache Nifi is a "plugin based system", exactly what do you mean? Does Nifi just execute a subprocess and the subprocess can be written in any language? Or does Nifi load a plugin as a java class, for example? Just trying to find out where the boundary of the "plugin" is.
Yes, I'd like to eventually be able to package Maxwell as a library. The current implementation then would become an application that uses the maxwell-core
library, and we can have a kafka-connect plugin that uses maxwell-core and an Nifi plugin that uses maxwell-core, etc.
I haven't made much progress on that front, unfortunately, but I have thought about it a bunch.
@wushujames NiFi comes with 100+ built-in processes (bundled as .nar files in /lib directory). Users use canvas to drag and drop processers to visually build dataflows. We can build our own custom processes and drop it (e.g. nifi-maxwell.nar
) in /lib directory to make it available in the canvas. For example, I created nifi-websocket processor wrapping Vert.X.
NiFi API is pull based(iterator mode), I am thinking Maxwell
can writer to LinkedBlockingQueue
and NiFi can consume from that queue. The better option would be making maxwell-core
iterable so that NiFi drive the data retrieval.
@xmlking I'm hijacking your issue and turning this into "figure out how to make maxwell a library". Hope you don't mind.
Sorry it took me long to respond about this. Here's what I think a Maxwell library would look like:
As an aside: Conceptually, if you had the raw binlog events in a binlog "stream", and the schema registry events in a kafka "stream", then I think you are doing a "streaming join" of the two streams to compute schema'd binlog events.
_Pull vs. Push API_
ReactiveX provides async API that is well suited for implementing streaming sources like Maxwell. with composable higher-order functions, end users can easily implement functionality such as blacklist, whitelist database , tables , retry connection etc.
By adopting RxJava
or Reactive Streams API , developers can chose either pull or push behavior. Those SDKs also support back-pressure to auto-tune data flow velocity. I think Reactive Streams
API will become part of Java 9 Flow API.
_Emit full Schema_
In one of my use case, I need full Schema info i.e., VARCHAR(50) VARCHAR(256) etc. wish we can capture and emit those field length information also in the Schema events
If we implement ReactiveX API, Maxwell will be exposing couple of Observables (rx_binlog, rx_bootstrap, rx_schema etc) that users can subscribe on main thread or other compute thread declaratively. Maxwell can connect/disconnect from MySQL based on subscribe & unsubscribe events.
this was completed, maxwell-as-a-jar is up in the maven repo.
@osheroff just checking if there is any interest or concerns to wrap maxwell as NiFi processor.
Apache NiFi is plugin based system that lets developers to build their own custom processors and use them along with built-in processors , to compose dataflows using web based user interface.
I have been using Maxwell in
MySQL --> Maxwell --> Kafka --> NiFi --> Hadoop(HDFS)
dataflow.Benefits: