openzipkin / zipkin-aws

Reporters and collectors for use in Amazon's cloud
Apache License 2.0
69 stars 34 forks source link

Draft: Implement DynamoDB storage component #125

Closed devinsba closed 4 years ago

devinsba commented 5 years ago

There's a couple things here that I need opinions on:

  1. I've taken inspiration from haystack and put the full encoded span in the database. This allowed the span table to only have the attributes that are required by our queries in the table. Due to the fact that some of our queries (those without service or span) end up being table scans this will help us limit the response sizes.
  2. I'm not super happy with the dependencies table implementation, I'd like that to be better if someone would take a look
devinsba commented 5 years ago

while letting this soak for a while, I've realized I have 1 too many tables since 2 of them have the same number and types of fields. I can consolidate that to save people some $ probably. There should be a changeset with that update this weekend

devinsba commented 5 years ago

Was discussing with Adrian the queries, I'm going to post a few here in comments

Get spans by name

{
  "TableName": "zipkin-spans",
  "IndexName": "span_name",
  "ScanIndexForward": false,
  "ProjectionExpression": "trace_id, trace_id_64, span_timestamp",
  "KeyConditionExpression": "span_name = :span_name AND span_timestamp_id BETWEEN :timestamp_id_lower_bound AND :timestamp_id_upper_bound",
  "ExpressionAttributeValues": {
    ":timestamp_id_lower_bound": {
      "N": "28653312812297787557491507200000"
    },
    ":span_name": {
      "S": "get"
    },
    ":timestamp_id_upper_bound": {
      "N": "28656500409692171312084461551615"
    }
  }
}

Get spans for service name and tag=value

{
  "TableName": "zipkin-spans",
  "IndexName": "local_service_name",
  "ScanIndexForward": false,
  "ProjectionExpression": "trace_id, trace_id_64, span_timestamp",
  "FilterExpression": "#tag_key0 = :tag_value0",
  "KeyConditionExpression": "local_service_name = :local_service_name AND span_timestamp_id BETWEEN :timestamp_id_lower_bound AND :timestamp_id_upper_bound",
  "ExpressionAttributeNames": {
    "#tag_key0": "tag.local"
  },
  "ExpressionAttributeValues": {
    ":timestamp_id_lower_bound": {
      "N": "28653312812297787557491507200000"
    },
    ":timestamp_id_upper_bound": {
      "N": "28656500409692171312084461551615"
    },
    ":tag_value0": {
      "S": "app"
    },
    ":local_service_name": {
      "S": "frontend"
    }
  }
}

Scan for spans between endTs and endTs-lookback

{
  "TableName": "zipkin-spans",
  "ProjectionExpression": "trace_id, trace_id_64, span_timestamp",
  "FilterExpression": "span_timestamp_id BETWEEN :timestamp_id_lower_bound AND :timestamp_id_upper_bound",
  "ExpressionAttributeValues": {
    ":timestamp_id_lower_bound": {
      "N": "28653312812297787557491507200000"
    },
    ":timestamp_id_upper_bound": {
      "N": "28656500409692171312084461551615"
    }
  }
}
codefromthecrypt commented 5 years ago

I'm doing some refactoring and will push a commit. mostly around async call stuff

codefromthecrypt commented 5 years ago

I have everything using zipkin2.call natively now, but I broke some tests, and haven't done cleanup, so didn't push a commit yet to your branch. Pushed a temporary branch just for peeks as it has been a few days..

https://github.com/devinsba/zipkin-aws/compare/dynamodb-storage...openzipkin:devinsba-dynamodb-storage

also, I reverted pagination as it is easier to grok if done separately I think.

codefromthecrypt commented 5 years ago

force pushed this to rebase over the build that actually runs integration tests

codefromthecrypt commented 5 years ago

and green build is finally legit!

codefromthecrypt commented 5 years ago

ok finally done async call porting. don't blame @devinsba on problems about it :)

note: I temporarily undid the pagination as it will keep the change from going 10k lines

codefromthecrypt commented 5 years ago

rebased on master.. I'm not currently doing any other work here, but ping me if I should be!

codefromthecrypt commented 5 years ago

I'm not able to carry this forward and moreover I think we shouldn't do this unless there is more popularity behind it. currently x-ray is busted and no one has horsepower to help, we don't want to do that again. Also @llinder did some napkin math and it seems like dynamodb could be a very expensive option, due to the cost of the compute side of it.

one option for how to carry this forward is to make a repo in zipkin-contrib so that effort here so far can be moved to a place of less drift. Does that make sense?

devinsba commented 5 years ago

Understood. Thanks for taking it this far. I'll try to find time to finish it even just for the exercise. I would like to have the real numbers to understand the cost here. I estimated it might be less than XRay but I'd like to really see

codefromthecrypt commented 4 years ago

just checking in as it is almost 6 months since last note and maybe circumstances have changed. should we cut this tree into a repo in zipkin-attic? that way, work so far is archived so that others can look at it, but without keeping it as a zombie PR

codefromthecrypt commented 4 years ago

simpler option is to close it out as the source is listed "unknown repository" anyway. archiving can happen separately. I'll close for now