mtesseract / nakadi-client

Haskell Client Library for the Nakadi Event Broker
Other
13 stars 9 forks source link

FlowId parsing in MetadataEnriched #37

Closed vitold closed 6 years ago

vitold commented 6 years ago

Added flowId field parsing to the MetadataEnriched; Added test for flowId parsing; Removed unused imports to prevent compiler warnings;

mtesseract commented 6 years ago

I just had a quick look at the Nakadi API. I might be misunderstanding something here, but shouldn't it be as follows? When you post an event, the correct data type to parse the metadata into would be MetadataEnriched. The distinction between enriched and non-enriched metadata is not directly visible in the API definition, but it is implicitly there (by means of the enriching done by Nakadi).

Therefore, I wonder, why should one ever use something else than MetadataEnriched for event consumption?

If you agree with this, what do you think about the following PR? https://github.com/mtesseract/nakadi-client/pull/38

If this looks reasonable, then your flowId PR would boil down to adding the _flowId field to MetadataEnriched?

vitold commented 6 years ago

For the Metadata and MetadataEnriched I also have some doubts. Api documentation says that we can define enrichment_strategy: https://zalando.github.io/nakadi/manual.html#definition_EventType*enrichment_strategies But since they provide just a single option - metadata_enrichment, the other one would be - not enrich metadata at all, which logically leads to usage of non-enriched model.

mtesseract commented 6 years ago

Yes, the manual is not very precise about this enrichment topic. But I hope that since the manual does not define seperate metadata objects, our definition of MetadataEnriched should be sufficient irregardles of the actual enrichment strategy in use.

Maybe I am overlooking something, but what could go wrong? Some new enrichment strategy might set some new field in the metdata object — in that case it would have to be an optional field for Nakadi, hence a Maybe for us. No problem.

I guess if Nakadi desires anything fancier than this they would have to specify seperate metadata types.

vitold commented 6 years ago

Agree, the only thing I need to check is - what metadata fields are returned by nakadi if enrichment is disabled to wrap the other fields in EnrichedMetadata by Maybe.

mtesseract commented 6 years ago

Great, let me know about your findings. Then we can merge #38.

vitold commented 6 years ago

Checked how nakadi behaves: 1) There is no way to create event type without any enrichment_strategy. EventType API returns failure if this field not provided or passed empty array as a value. 2) For every event it generates metadata with next fields: "occurred_at" "eid" "event_type" "partition" "received_at" "flow_id" "version" Even flowId shouldn't be defined as an optional field.

mtesseract commented 6 years ago

Thanks!

So, what happens when you don't provide a flow Id while posting an event? Will the resulting metadata on consumption simply contain {"flowId": ""}?

Well. When I initially wrote nakadi-client, I stumbled upon several smaller inconsistencies between the Nakadi API description and how it actually behaves. When I find the time (not soon) I should assemble a precise list for the Nakadi people so that they can modify the API description. But this is a different topic.

What I am trying to say is: according to the manual it should be optional, no? My understanding is that the red star signals that a field is mandatory. If they always provide a flow ID field, fine, we can handle that. But currently they are free to not send it (when it was not specified during POST, I suppose) and in that case we would produce a parse failure if we do not declare it as Maybe.

As I understand Nakadi, this whole enrichment strategie business might not be carved into stone — after all they provide an endpoint for querying a Nakadi server for the supported enrichment strategies.

vitold commented 6 years ago

If flowId is not provided by producer it will be automatically generated by nakadi:


curl -v -XPOST -H "Content-Type: application/json" http://localhost:8080/event-types/order_received/events -d '[
{"metadata":{"occurred_at":"1996-10-15T16:39:57+07:00","eid":"1f5a76d8-db49-4144-ace7-e683e8ff4ba4"},"data_op":"C","data":{"order_number":"abc","id":"111"},"data_type":"blah"}]'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> POST /event-types/order_received/events HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 178
> 
* upload completely sent off: 178 out of 178 bytes
< HTTP/1.1 200 OK
< Date: Tue, 12 Dec 2017 22:21:57 GMT
< X-Flow-Id: uahiddufGSHK7XwmBC2Bw8rF
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< X-Application-Context: application:8080
< Content-Length: 0
< 
* Connection #0 to host localhost left intact```
vitold commented 6 years ago

And subscriber receives:

{
    "cursor": {
        "partition": "0",
        "offset": "001-0001-000000000000000003"
    },
    "events": [{
        "metadata": {
            "occurred_at": "1996-10-15T16:39:57+07:00",
            "eid": "1f5a76d8-db49-4144-ace7-e683e8ff4ba4",
            "event_type": "order_received",
            "partition": "0",
            "received_at": "2017-12-12T22:21:57.753Z",
            "flow_id": "uahiddufGSHK7XwmBC2Bw8rF",
            "version": "1.0.0"
        },
        "data_op": "C",
        "data": {
            "order_number": "abc",
            "id": "111"
        },
        "data_type": "blah"
    }]
}
mtesseract commented 6 years ago

I see. But what do you think about following the API vs taking the current behaviour as a basis for nakadi-client? I have tendency towards staying close to the guarantees provided by the API.

vitold commented 6 years ago

I also think that API documentation should be a primary source for modeling client. And update a client in case of breaking changes in nakadi.

mtesseract commented 6 years ago

Ok, feel free to rebase this branch from current master and then simply add the _flowId field as a Maybe Text (and while you are at it, maybe it also makes sense to add _partition). Should make for a smaller diff. :-)

vitold commented 6 years ago

Cool, thanks for your help!

mtesseract commented 6 years ago

Thank you for improving nakadi-client. :-) And good night.

mtesseract commented 6 years ago

I am not entirely sure why it was necessary to use Data.Foldable's toList, but I guess that is because of some ClassyPrelude quirks?

(Is there no newtype for PartitionNames? Actually there should be, but I'm not sure right now.)

In any case, I will merge this now to not block you! Thanks!