neo4j / neo4j-javascript-driver

Neo4j Bolt driver for JavaScript
https://neo4j.com/docs/javascript-manual/current/
Apache License 2.0
839 stars 148 forks source link

'EPIPE' error #175

Closed viz closed 7 years ago

viz commented 7 years ago

I'm getting an occasional EPIPE error when attempting a query using the javascript driver. Unfortunately I can't reproduce or predict the error so I don't have a lot of information yet, I'll see if I can add more debugging code to capture more data.

Configuration: The query is being run from within a nodejs lambda function on AWS. This means node v4.3. Our Neo4j instance is running on grapheneDB and is using Neo4j Community 3.0.6 neo4j-javascript-driver version is 1.0.4

Below is the code that was in place last time I saw this error - I refactored it a little yesterday and haven't yet had enough activity to see whether my refactor changed anything.

Original code: (removed non-Neo4j related code for clarity)

/* driver setup when lambda environment first loads */
const neoDriver = neo4j
  .v1
  .driver(process.env.NEO4J_BOLT_URL, neo4j.v1.auth.basic(process.env.NEO4J_USERNAME, process.env.NEO4J_PASSWORD), {
    encrypted: true,
    knownHosts: '/tmp/.neo4j/known_hosts',
  });

/* actual lambda function being called */
module.exports.submitApplication = (event, context, cb) => {
  context.callbackWaitsForEmptyEventLoop = false;

  const body = JSON.parse(event.body);
  const stripe = Stripe(process.env.STRIPE_KEY);

  new Promise((resolve, reject) => {
    /* removed stripe code - resolves after creating customer and charge. returns charge details 
  .then((result) => {
    const session = neoDriver.session();

    // update graph with transaction details
    const query = `MATCH (u:User{flake_id:{uid}}), (v:Variant{product_id:{subcr_plan}})
                  CREATE (u)-[:created]->(o:Order{flake_id:{oid}})-[:contains]->(v),
                  (u)-[:paid_with]->(t:Transaction{flake_id:{tid}})-[:paid_for]->(o) 
                  SET 
                  o.created_at = timestamp(),
                  o.updated_at = o.created_at,
                  o.source = 'ACN',
                  o.state = 'paid',
                  t.amount_in_cents = {amountInCents},
                  t.gateway_used = 'stripe',
                  t.masked_card_number = {maskedCardNumber},
                  t.currency = {currency},
                  t.card_type = {cardType},
                  t.type = 'Credit Card',
                  t.gateway_transaction_id = {txId},
                  t.created_at = timestamp(),
                  t.updated_at = t.created_at
                  RETURN u, o, t`;

    const queryParams = {
      uid: body.applicantId,
      oid: flakeGen(),
      tid: flakeGen(),
      subcr_plan: body.subId,
      amountInCents: result.amount,
      maskedCardNumber: `XXXX-XXXX-XXXX-${result.card.last4}`,
      currency: result.currency,
      cardType: result.card.type,
      txId: result.id
    };
    console.log('saving order/tx: ', query, queryParams);

    return session.run(query, queryParams)
      .then((res) => {
        console.log('saved order/tx: ', res);
        result.flake_id = res.records[0]._fields[2].properties.flake_id;
        return result;
      })
      .then((res) => {
        session.close();
        return result;
      });
  })
  .then((result) => {
    // publish to sns topic
  })
  .catch((err) => {
    cb(JSON.stringify(err));
  });
};

Note that I separated out the session.close into a separate .then() trying to see if I was somehow closing the session early. This didn't stop the problem is was probably unnecessary.

refactored to keep all the Neo4j code within the promise .then() closure. Given that this will recreate the drive each time, I expect this will be slower, but reliability is more important at this stage.

  .then((result) => {
    const neoDriver = neo4j
    .v1
    .driver(process.env.NEO4J_BOLT_URL, neo4j.v1.auth.basic(process.env.NEO4J_USERNAME, process.env.NEO4J_PASSWORD), {
      encrypted: true,
      knownHosts: '/tmp/.neo4j/known_hosts',
    });

    const session = neoDriver.session();

    // update graph with transaction details
    const query = `MATCH (u:User{flake_id:{uid}}), (v:Variant{chargify_id:{chargifyPlan}})
                  CREATE (u)-[:created]->(o:Order{flake_id:{oid}})-[:contains]->(v),
                  (u)-[:paid_with]->(t:Transaction{flake_id:{tid}})-[:paid_for]->(o) 
                  SET 
                  o.created_at = timestamp(),
                  o.updated_at = o.created_at,
                  o.source = 'ACN',
                  o.state = 'paid',
                  t.amount_in_cents = {amountInCents},
                  t.gateway_used = 'stripe',
                  t.masked_card_number = {maskedCardNumber},
                  t.currency = {currency},
                  t.card_type = {cardType},
                  t.type = 'Credit Card',
                  t.gateway_transaction_id = {txId},
                  t.created_at = timestamp(),
                  t.updated_at = t.created_at
                  RETURN u, o, t`;

    const queryParams = {
      uid: body.applicantId,
      oid: flakeGen(),
      tid: flakeGen(),
      chargifyPlan: body.chargifyId,
      amountInCents: result.amount,
      maskedCardNumber: `XXXX-XXXX-XXXX-${result.card.last4}`,
      currency: result.currency,
      cardType: result.card.type,
      txId: result.id
    };
    console.log('saving order/tx: ', query, queryParams);

    return session.run(query, queryParams)
      .then((res) => {
         console.log('saved order/tx: ', res);
        return { ...result, flake_id: res.records[0]._fields[2].properties.flake_id };
      })
      .then((result) => {
        session.close();
        return result;
      });
  })

Symptoms: When this error occurs there is no log message 'saved order/tx: ... ' and no change in the graph indicating that the error occurs before the session.run promise is resolved.

The error message isn't very helpful - the catch returns this to the API Gateway:

{
    "errorMessage": "{\"code\":\"EPIPE\"}"
}

I've added more logging but haven't seen a re-occurence yet to get more info.

Not sure where to go from here. I don't think there is any glaring mistake in the code (although I haven't been using the Bolt driver for very long) and I can't tolerate frequent errors here. Given that this occurs after the stripe payment is complete, retrying the API call is out, so will probably send to an error queue and deal with retrying separately.

oskarhane commented 7 years ago

Hi, thank you for all the information.
I've never seen that error message before, maybe it's related to how lambda handles instances. We'll do what we can to reproduce it to be able to find the issue. If you manage to reproduce it reliably, please get back to us.

lutovich commented 7 years ago

Hello @viz,

I'm unable to reproduce this issue by running your queries locally. See this gist with the repro code: https://gist.github.com/lutovich/02c1a320f6e02dd5f20d62555af9c3c4. Tried killing the server while test is running but this always resulted in correct "session expired" errors. Used neo4j 3.0.6 and node v4.3.2/v6.7.0.

Do you still see EPIPE errors? Are there any new details from extended logging?

Thanks.

viz commented 7 years ago

I haven't seen any of these errors for a while, but we've had pretty low traffic volume as we incrementally increase traffic to the product. We're keeping a close eye on this as we trickle feed new users to the system and we now have better monitoring on our lambda functions, so hopefully we'll get more info if it happens again.

zhenlineo commented 7 years ago

Hi I am closing this issue as it is quit while ago.

Pls be free to re-open/create new issue if you see this error again.

Pls specify your driver and server versions when you report the problem again.

Thanks, Zhen