The RDF Connect Orchestrator implements and bridges processor runners across environments and runtimes.
Runtime | Status | Notes |
---|---|---|
Kotlin | Ready for testing | Reference implementation, directly executed on own thread. |
Java | Unstable | Requires a thread per processor. |
Node.js | Unstable | Reference gRPC implementation. |
Python | Planned | None. |
Rust | Planned | None. |
[!NOTE] RDF Connect pipelines are typically written in RDF. At the time of writing, only the Turtle file format is supported, but other formats will be supported soon.
The first stage of the orchestrator is responsible for parsing the plain text configuration file into an intuitive and easy-to-use format. We call this the intermediate representation, as defined in our Protobuf schema. This phase is strictly separated from any and all actual setup of the orchestrator and individual runners, and can therefore be customized easily by enforcing the Parser
interface.
[!WARNING] An extended explanation of the Protobuf schema is required.
As part of the gRPC interface, any and all runners are required to implement the load
function of the gRPC server. This takes in a single IRStage
, which contains both the processor definition, as well as the untouched String
representation of the stage's arguments.
It is the runners responsibility to bring the processor into the runtime, deserialize the arguments based on the parameter configuration, and call the processor's constructor.
[!NOTE] Processors should not do any heavy lifting inside of their constructor. All actual computations should be done inside the
exec
function as defined in the interface of the corresponding runtime.
Communication between two processors must pass the orchestrator, at least for the time being. It acts as a central hub and message broker, which means that the runners themselves are not responsible for most of the message routing.
Specifically, this means that a runner must pass an incoming message from the gRPC bidirectional channel
stream to the correct reader, as well as send any outgoing message from any writer back into the channel
function.
Anyone may create and publish their own processors. However, to get started quickly, we provide some helpful processors as part of the Standard Processor Library. These are included by default, and may serve as a reference to implement your own processors.
Interact with RDF data.
Processor | Description |
---|---|
conn:RDFValidator |
Validate RDF data using SHACL. |
These processors interact with the network.
Processor | Description |
---|---|
conn:HttpFetch |
Reads data from an HTTP endpoint. |
Fetch and write data from and to the local file system.
Processor | Description |
---|---|
conn:FileReader |
Reads a file with a given path from the local file system. |
conn:FileWriter |
Overwrites/appends a file with a given path using the incoming data. |
At the time of writing, we support a limited set of literal types. You may also use complex data structures, which will be represented as a map.
URI | Kotlin | Node.js |
---|---|---|
xsd:boolean |
boolean |
Boolean |
xsd:byte |
byte |
|
xsd:dateTime |
java.util.Date |
Date |
xsd:double |
double |
|
xsd:float |
float |
|
xsd:int |
int |
Number |
xsd:long |
long |
Number |
xsd:string |
java.lang.String |
String |
Note that SHACL will validate your configuration, so out-of-range or invalid values will be caught.
The only Kotlin version supported is v1.9.22
for the time being, due to dependencies on the embedded compiler.
This repository supports pre-commit
hooks. To install the hooks, run the following command.
pre-commit install
The Kotlin code in this repository is formatted using Meta's ktfmt
tool, mainly due to the following feature.
ktfmt
ignores most existing formatting. It respects existing newlines in some places, but in general, its output is deterministic and is independent of the input code.
No feature flags are used. Invoke using the following command.
$ ktfmt ./**/*.kt
Due to ktfmt
's relation with google-java-format
, we use the later for Java code formatting. Invoke using the following command.
$ google-java-format -r ./**/*.java
The Node.js runner provides a npm
script to format and lint all code.
$ npm run format --prefix ./runners/nodejs