hashmapinc / Drillflow

A dockerized WITSML API Server that is agnostic of the backend.
Apache License 2.0
17 stars 13 forks source link

DEV: LOG: (URGENT) Add To Store Errors Out #603

Closed shehzadsidi closed 5 years ago

shehzadsidi commented 5 years ago

AddToStore for both Time and Depth errors in "Internal Server" response. However, the object is actually being created. The issue is with response:

-1 {"message":"Internal Server Error"}
shehzadsidi commented 5 years ago

Request Log Object:

log Well Energistics-well-0001 Wellbore Energistics-w1-wellbore-0001 Log_HM_43cc444f-f182-478e-ab8b-47909051d561 Baker Hughes INTEQ 12.3 2001-06-18T13:20:00.000Z Drilling Data Log date time 0 2001-06-18T13:20:00Z 2001-06-18T14:20:00Z increasing Time -999.25 Time time m md -999.25 time stamp 0 raw double DVER TVD of hole m tvd -999.25 Vertical depth 0 raw double DBTM measured depth of DST bottom m distBit -999.25 Distance drilled by bit 0 raw double DTOR_RT torque (average) kft.lbf tqOnBot -999.25 On bottom torque 0 raw double TQLS torque (average) kft.lbf toOffBot -999.25 Off bottom torque 0 raw double Time,DVER,DBTM,DTOR_RT,TQLS s,m,m,kft.lbf,kft.lb 2001-06-18T13:20:00.01Z,0,1.45,, 2001-06-18T13:30:00.01Z,500,,0.01,1.42 2001-06-18T13:40:00.03Z,501.02,,0.02,1.41 2001-06-18T13:50:00.01Z,502,3.9,0.06, 2001-06-18T14:00:00.01Z,503,4.9,0.11,1.48 2001-06-18T14:10:00.05Z,504.04,5.94,0.18,1.55 2001-06-18T14:20:00.03Z,,612.03,1.83,3.32 2003-11-24T08:15:00.000Z 2003-11-24T08:15:00.000Z plan These are the comments associated with the log object. ]]>
TessForGithub2 commented 5 years ago

Fixed and handed over to Sukhe for his next PR (branch 551, PR #604 ) since it involved just commenting out 1 line of code.

There are 2 lingering concerns:

Even tho sending the data over to DoT resulted in the data being created (as proven by a subsequent GetFromStore), DoT returns a "500".

Concern #1: Since there is no rollback, I thought we all agreed that if a ChannelSet is successfully created, even a failed attempt to add channels or add data will NOT result in -1. So, for QA that only can verify the "1" result, if the data (as in this case) results in DoT returning a 500 while still adding the data, QA will assume the data will be there. IT IS A FALSE POSITIVE should the 500 one day result in the data NOT being added. But if you think about it, our philosophy of "no rollback" locks us into this path.

There is no way for QA to granularize an error to a channel or data failure (only a channel set failure).

Concern #2: Why does DoT return a 500 (Internal Server Error) on data but data is successfully added?

So I will close this card but express these two concerns in a meeting where we can dissect it further.

TessForGithub2 commented 5 years ago

UPDATE: The errors need to be thrown by the Valve back with appropriate error codes. The log must reflect accurate error tracing so the root cause can always be uncovered. The strategy for repeating AddToStore is not viable as best as I can tell (it will result in a -409 error since a ChannelSet must be present & mappings happen each & every time an AddToStore occurs). The suggestion to just repeat Steps 2 & 3 (Add Channel & Add Data) as required is not a workable or desirable solution. SLB is working their error on their side and I have provided the JSON format for the request bodies in the SC general channel and email.

Further discussion is required, and I will reopen this card but mark it as "blocked" until SLB fixes the error on their side.

TessForGithub2 commented 5 years ago

UPDATE: The only acceptable solution to this issue is to provide rollback.

The problems are:

  1. There are 3 REST calls on the SLB's API side for an AddToStore: channelSet, channels, & data.
    However, on the WITSML API side, there is no concept of REST calls (knowledge of the solution architecture is not necessary for the API to work). The WITSML API side expects a virtually immediate "success" or "failure" of the entire query. So introducing delay in response causes a timeout, and therefore a solution involving retry by Drillflow is not possible. It would cause more confusion (errors) Client side.
  2. Retry as suggested by SLB only captures transient errors (network or DoT server down). There is currently no Drillflow logic to discern transient vs. query errors, and due to #1 above, it is not plausible.

I will open a new card that focuses on communicating the correct solution and providing more details of what it will take to provide rollback.