snowplow-incubator / snowplow-micro

Standalone application to automate testing of trackers
Other
47 stars 15 forks source link

Improve /micro/bad response format #74

Open adatzer opened 3 years ago

adatzer commented 3 years ago

This issue is about potential improvements to the bad row output in Snowplow Micro.

Is your feature request related to a problem? Please describe.

Currently, the response format of /micro/bad endpoint is a JSON array of BadEvents.

The errors parameter of a BadEvent is a list of strings:

  1. The first "error" is just a message, that denotes a type of failure in a general way. For example:
"Error while extracting event(s) from collector payload and validating it/them."
  1. The second "error" is json-escaped bad row. For example:
"{\"schema\":\"iglu:com.snowplowanalytics.snowplow.badrows/tracker_protocol_violations/jsonschema/1-0-0\",\"data\":{\"processor\":{\"artifact\":\"snowplow-micro\",\"version\":\"1.1.2\"},\"failure\":{\"timestamp\":\"2021-08-24T13:09:14.799119Z\",\"vendor\":\"com.snowplowanalytics.snowplow\",\"version\":\"tp2\",\"messages\":[{\"field\":\"body\",\"value\":\"\",\"error\":\"invalid json: exhausted input\"}]},\"payload\":{\"vendor\":\"com.snowplowanalytics.snowplow\",\"version\":\"tp2\",\"querystring\":[],\"contentType\":\"application/json\",\"body\":\"\",\"collector\":\"ssc-2.3.1-stdout$\",\"encoding\":\"UTF-8\",\"hostname\":\"0.0.0.0\",\"timestamp\":\"2021-08-24T13:09:14.797Z\",\"ipAddress\":\"172.17.0.1\",\"useragent\":\"curl/7.74.0\",\"refererUri\":null,\"headers\":[\"Timeout-Access: <function1>\",\"Host: 0.0.0.0:9090\",\"User-Agent: curl/7.74.0\",\"Accept: */*\",\"application/json\"],\"networkUserId\":\"283214ca-7868-465b-95eb-27418c8b872f\"}}}"

The problem is that users need to parse the json-escaped string in order to get to the actual error that resulted to a failed event. In addition, the compact BadRow already contains information also shared in the other BadEvent's parameters (collectorPayload and rawEvent).

Improving the bad row output of Micro will also improve the user experience and at the same time provide the necessary information without loss or duplication.

Describe alternatives you've considered

Solutions we've considered so far in discussions (cc @paulboocock , @istreeter ) include:

  1. errors to be a list of messages, not a list of json-escaped bad rows.
  2. /micro/bad to respond with a JSON array of BadRows instead.
miike commented 3 years ago

I think having a list of errors in addition to a JSON array of the bad rows would definitely be useful. Having the complete JSON of the bad rows also opens up the bad endpoint to having some ability to filter in the same way that you can filter on good - only in this case you be able to filter on schema_violations, tracker_protocol_violations etc.

I'd also be tempted to standardise some of the JSON response that is returned back

e.g., for /micro/good, event.rawEvent is equivalent to event.collectorPayload in /micro/bad

markst commented 3 months ago

Possibly a duplicate of https://github.com/snowplow-incubator/snowplow-micro/issues/14 ?