rubensworks / fetch-sparql-endpoint.js

A simple, lightweight module to send queries to SPARQL endpoints and retrieve their results in a streaming fashion.
MIT License
22 stars 13 forks source link

fetch is timing out on AWS Neptune Endpoint #70

Open myonara opened 1 year ago

myonara commented 1 year ago

The following code is timing out in connection with the neptune endpoint.

// first the configuration preparation:
const getHostAndPort = (writeFlag:boolean) :any  => {
    var result:any={};
    if (writeFlag) {
        result = { 
            host: process.env.NEPTUNE_WRITE_HOSTNAME,
            port: process.env.NEPTUNE_WRITE_PORT,
            url: "https://"+process.env.NEPTUNE_WRITE_ADDRESS+"/sparql",
        };
    } else {
        result = { 
            host: process.env.NEPTUNE_READ_HOSTNAME,
            port: process.env.NEPTUNE_READ_PORT,
            url: "https://"+process.env.NEPTUNE_READ_ADDRESS+"/sparql",
        };
    }
    const fetchHeaders = new Headers();
    fetchHeaders.append('host',result.host+":"+result.port);
    const fetcherOptions : ISparqlEndpointFetcherArgs  = {
      method: 'POST',
      defaultHeaders: fetchHeaders,            
      fetch: async (url:any,init:any)=>{
        console.log("fetch-1: ",url,init); // this is coming
        const r = await fetch(url,init); // this is not returning
        console.log("fetch-2: ",r); // as this is never printed out.
        return r;
      },                             
      prefixVariableQuestionMark: true,   
      timeout: 5000,                             
    };
    result.fetcherOptions = fetcherOptions;
    return result;
}
// routine to execute the query.
const a_runQuery = async (query:string,writeFlag:boolean=false) :Promise<any> => {
  const urlParam : any = getHostAndPort(writeFlag);
    try {
        const myFetcher : SparqlEndpointFetcher = new SparqlEndpointFetcher(
          urlParam.fetcherOptions
        );
        console.log("a_runQuery-1",query,urlParam);
        if (writeFlag) {
            if (myFetcher.getUpdateTypes(query) === 'UNKNOWN') {
                return formatResponse(null,500,"GQ0002","invalid update format",undefined,[query]);
            }
            await myFetcher.fetchUpdate(urlParam.url,PREFIX+query); // not tested yet.
        } else {
            var resultStream : any;
            switch (myFetcher.getQueryType(PREFIX+query)) {
                default:
                case "UNKNOWN":
                    return formatResponse(null,500,"GQ0002","invalid query format",undefined,[query]);
                case "SELECT":
                    console.log("a_runQuery-2a",query,urlParam.url);
                    resultStream = await myFetcher.fetchBindings(urlParam.url,PREFIX+query); // it is timing out
                    break;
                case "ASK":
                    const answer = await myFetcher.fetchAsk(urlParam.url,PREFIX+query);
                    return formatResponse(answer,200,"","",undefined,[query]);
                case "CONSTRUCT":
                    resultStream = await myFetcher.fetchTriples(urlParam.url,PREFIX_QUERY+query);
                    break;
                }
                console.log("a_runQuery-3",resultStream);
                const promise = new Promise((resolve, reject) => {
                    var result :any[]=[];
                    resultStream.on('data',(chunk:any)=>result.push(chunk));
                    resultStream.on('end',(chunk:any)=>{
                        console.log("a_runQuery-4",result);
                        resolve(formatResponse(result,200,"","",undefined,[query]));
                    });
                });
                return await promise;
            }
    } catch(e) {
        return formatResponse(null,500,"GQ0001","Fatal Query Error",e,[query]);
    }
}

With the old code (AWS sdk for javascript V2, getting out of support end of 2023!) it is currently working, with that I have fixed also some connection issues in the VPC configuration with the help of the AWS support.

The log output at the console.log fetch-1 from above looks l like:

2023-09-13T18:24:41.306Z    7860ab0f-b165-4573-a8cd-46e3e9c39266    INFO    fetch-1: <url-censored>  {
  headers: HeadersList {
    cookies: null,
    [Symbol(headers map)]: Map(4) {
      'host' => [Object],
      'accept' => [Object],
      'content-type' => [Object],
      'content-length' => [Object]
    },
    [Symbol(headers map sorted)]: null
  },
  method: 'POST',
  body: URLSearchParams {
    'query' => 'PREFIX wd: <---censored---> \n' +
    '  SELECT ?g ?id ?sub ?status ?name ?email\n' +
    '  WHERE { GRAPH ?g { \n' +
    '        ?id wd:usersub ?sub ;\n' +
    '            wd:userstatus ?status . \n' +
    '            OPTIONAL {?id wd:userdisplayname ?name }\n' +
    '            OPTIONAL {?id wd:email ?email }\n' +
    "            FILTER(?sub = 'e344d822-80a1-7059-591f-d8aa88910334')\n" +
    '      }}\n' +
    '    ' },
  signal: AbortSignal {}
}

I found some hints in the Neptune manual from AWS.

I have also a log for an header request with the old code, maybe that may help in finding the solution:

 HttpRequest {
  method: 'POST',
  path: '/sparql',
  headers: {
    'User-Agent': 'aws-sdk-nodejs/2.1450.0 linux/v18.17.1 exec-env/AWS_Lambda_nodejs18.x',
    'Content-Type': 'application/x-www-form-urlencoded',
    host: 'lhp-dev.cluster-ro-cqcasldr2nio.eu-central-1.neptune.amazonaws.com:8182',
    'X-Amz-Date': '20230915T145032Z',
    'x-amz-security-token': 'CENSORED'
    Authorization: 'AWS4-HMAC-SHA256 Credential=CENSORED/20230915/eu-central-1/neptune-db/aws4_request, SignedHeaders=host;x-amz-date;x-amz-security-token, Signature=afe472b6e55c946fbf66a3732e36f2d925e85f713ee4378401e573120c96eee8'
  },
  body: "query=PREFIX%20wd:%20%3Chttps://censoredns/%3E%20%0A%20%20%20%20%20%20%20%20SELECT%20*%20WHERE%20%7B%0A%20%20%20%20%20%20%20%20%20%20%7B%20%0A%20%20SELECT%20?g%20?id%20?uuid%20?namespace%20?idpath%20?DocType%20?parentLink%20?version%20?lastupdatedby%20?updatedAt%20%0A%20%20?name%20(%22TargetNode%22%20AS%20?resultType)%0AWHERE%20%7B%20GRAPH%20?g%20%7B%20%0A%20?id%20wd:nodeid%20?uuid%20;%0A%20%20%20%20%20wd:namespace%20?namespace%20;%0A%20%20%20%20%20wd:name%20?name%20;%0A%20%20%20%20%20wd:idpath%20?idpath%20;%0A%20%20%20%20%20wd:DocType%20?DocType%20;%0A%20%20%20%20%20wd:parentLink%20?parentLink%20;%0A%20%20%20%20%20wd:version%20?version%20;%0A%20%20%20%20%20wd:lastupdatedby%20?lastupdatedby%20;%0A%20%20%20%20%20wd:updatedAt%20?updatedAt%20.%0A%20%20%20%20%20FILTER(str(?uuid)='84368121-bbdd-4b0a-8293-dbde63947477')%20%0A%20%20%20%7D%7D%0A%20%20%20%7D%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20limit%20100%20offset%200%0A%20%20%20%20%20%20%20%20",
  endpoint: {
    protocol: 'https:',
    host: 'CENSORED',
    port: 8182,
    hostname: 'CENSORED',
    pathname: '/',
    path: '/',
    href: 'CENSORED',
    constructor: [Function: Endpoint] { __super__: [Function: Object] }
  },
  region: 'eu-central-1',
  _userAgent: 'aws-sdk-nodejs/2.1450.0 linux/v18.17.1 exec-env/AWS_Lambda_nodejs18.x'
}
rubensworks commented 1 year ago

Perhaps nepture is just taking more than 5 seconds to execute your query? What happens if you remove timeout from the options?

myonara commented 1 year ago

Hi Ruben, well it timesout for ever I have raised it to 30 seconds and nothing happens. By the way the old way is < 0.5 seconds

myonara commented 1 year ago

My working solution is an out dated example with the AWS SDK for javascript V2. the sd v2 is out of suppot end of the year, so I want to have a differnt connection and also the comfort of this fetch library.