rdf-connect / ldes-client

The TREE/LDES client to replicate and synchronize LDESs: the RDF Connect processor
https://rdf-connect.github.io/ldes-client/
6 stars 2 forks source link

The LDES client

This package provides common tooling to work with LDESes.

The main functionality is to replicate an LDES, and keeping in-sync with it. It pipes the result to another processor complying to the connector architecture.

Install the package using npm install -g ldes-client

Replication and synchronization

The LDES client has two modes: sync and replicate. Both are accessable view the ldes-client command.

ldes-client <url> [-f] [--save <path>] [--pollInterval <number>] [--shape <shapefile>]  

Flags

You can also use this as a library in your TS/JS projects. See the client.ts file for documentation.

Use cases

Software architecture

The client contains two parts, fetching the fragments and emitting the members. Use cases have different influences on these parts.

For example:

Difficulties:

Fragment Fetcher

The fragment fetcher fetches the fragments. These fragments are targeted by relation chains, but only two types exist, important and not important. For example, when emitting members in order, the important relations are the GreaterThan relations, because all other relation types are equivalent, that is to say, we can only emit members when all unimportant relations are fetched and processed.

Relation Chains are chains, because when you fetch a page, you can find new relations pointing from that page. But we need to distinguish between a relation after an important relation or a relation after an unimportant relation. Important relations squash unimportant relations, these chains should only be fetched if all unimportant relations are done. Unimportant relations squash other unimportant relations. Important relations squash other important relations, the new value is the bigger value of the two. The ordering of these chains is thus, first unimportant relations, then important relations ordered on value.

These chains dictate the order that pages should be fetched. Because fetching is asynchonous, we can only interpret a page, if no pages are in flight, that came from a smaller relation. In code this is denoted by heaps readyPage and inFlightPages, that both contain relation chains. Note that relation can be interpretted at any time.

When a page is ready to be interpretted, the helper is asked to interpret the page. A special value called marker is derived from the value of the incoming chain if the chain was important. For example, when emitting members in order, the member manager can always extract the members that are found, but can only emit them when a marker is issued and only the members that are smaller than that marker.

Fault tolerance

The fetcher tries to be tault tolerant. HTTP codes that indicate that the server is overloaded or something else is going wrong are caught and retried. This is the default behaviour when the provided config does not provide a fetch function.

Caught HTTP codes:

// Provide your own codes with a custom retry function
config.fetch = retry_fetch(fetch, [408, 425, 429, 500, 502, 503, 504], 500, 5);

Member Manager

The member manager just extract members and emits them when they are ready. Extracting members is asynchonous, because sometime out of bound requests are made. This result in currentPromises that are awaited before sorting and emitting members.

The streaming api comes with a requirement to always emit at least one member, per poll. To achieve this, the memberManager has a function called reset() which returns a promise when a member is emitted.

Expected Features

Other tooling available from this repository

Authors and license

© 2023 -- Ghent University - IMEC. MIT license