rubensworks / fetch-sparql-endpoint.js

A simple, lightweight module to send queries to SPARQL endpoints and retrieve their results in a streaming fashion.
MIT License
21 stars 12 forks source link

stream.read() is null #41

Closed KonradHoeffner closed 2 years ago

KonradHoeffner commented 2 years ago

I want to fetch SPARQL results in a function and return them, thus I cannot use "stream.on" (which works) but need to use "stream.read()", which returns null:

import { SparqlEndpointFetcher } from "fetch-sparql-endpoint";

async function query()
{
    const fetcher = new SparqlEndpointFetcher();
    const stream = await fetcher.fetchBindings("https://dbpedia.org/sparql", "SELECT ?x {?x a owl:Class.} LIMIT 2");
    const bindings = stream.read();
    stream.on("data",data=> console.log("on",data))
    return bindings;
}

console.log("read",await query());

Result:

read null
on {
  x: NamedNode {
    termType: 'NamedNode',
    value: 'http://www.w3.org/2002/07/owl#Thing'
  }
}
on {
  x: NamedNode {
    termType: 'NamedNode',
    value: 'http://www.w3.org/2002/07/owl#Nothing'
  }
}

Is stream.read() not supported by fetch-sparql-endpoint.js or did I use read incorrectly?

KonradHoeffner commented 2 years ago

P.S.: I promisify the function in the following way, but it will only return the first row:

import { SparqlEndpointFetcher } from "fetch-sparql-endpoint";

async function query()
{
    const fetcher = new SparqlEndpointFetcher();
    const stream = await fetcher.fetchBindings("https://dbpedia.org/sparql", "SELECT ?x {?x a owl:Class.} LIMIT 4");
    return new Promise((resolve,reject) => {stream.on("data",data=> resolve(data));});
}

console.log(await query());

Is there any way to get the complete results in one go? I just want to query small static files and do not need stream processing.

rubensworks commented 2 years ago

This seems to be working as intended. According to the stream documentation, the read() method will only return something if the buffer is non-empty, which is something you can detected by listening to the 'readable' event.

Is there any way to get the complete results in one go?

Sure, many packages exist for this, such as this one: https://www.npmjs.com/package/arrayify-stream

KonradHoeffner commented 2 years ago

Thank you, I will use the package you provided!

However, and I hope you don't take this the wrong way, but what is the motivation to use this streaming method in the first place? If I process several GB of data, like the whole DBpedia, then I won't use JavaScript at all but instead some other programming language with real multithreading, and for a few MB where JavaScript makes sense, I can't imagine this having enough of a memory impact to fill up even the RAM of a low spec PC with 8 GB of RAM. And even when using asynchronous processing, existing approaches such as iterators and promises are more user friendly in my opinion, having those three different callbacks that are identified with string keys look like a quite cumbersome approach to me. Am I missing something here? Is it designed for large-scale Node.js processing on a server?

rubensworks commented 2 years ago

what is the motivation to use this streaming method in the first place?

Many reasons for stream processing exist :-)

My own motivations are coping with very slow, or infinite data sources. For instance, this enables iterative results for very long-running queries: http://query.linkeddatafragments.org/

existing approaches such as iterators and promises are more user friendly in my opinion

Agreed, but they are slower. Node.js EventEmitters are significantly faster still. That doesn't take away the possibility to build abstractions such as asynciterators on top of this callback-based system though.