treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.47k stars 359 forks source link

Log source IP in operations logs #4261

Closed arielshaqed closed 1 year ago

arielshaqed commented 2 years ago

We currently do not log the source IP of an operation. It would really have helped me understand Spark behaviour if we did :-) This should support load balancers placed in front of the server, by respecting X-Forward headers or something.

For reference, here's a log from a DeleteObject operation that I just performed:

{
  "file": "usr/local/go/src/net/http/server.go:2047",
  "func": "net/http.HandlerFunc.ServeHTTP",
  "host": "economy.us-east-1.lakefscloud.ninja",
  "level": "info",
  "log_audit": "API",
  "method": "DELETE",
  "msg": "HTTP call ended",
  "operation_id": "DeleteObject",
  "path": "/api/v1/repositories/test-1/branches/test-overwrite2/objects?path=amazon-reviews-repartition-sample%2F_temporary%2F0%2Ftask_202209280827329171407584636600739_0002_m_000055%2Fpart-00055-bfb9a212-2d0e-4aa5-b485-e0f4fd2f07b4-c000.snappy.parquet",
  "request_id": "86ce10cf-d4f0-48fc-aaf8-0c64fb9343d6",
  "sent_bytes": 0,
  "service_name": "rest_api",
  "status_code": 204,
  "time": "2022-09-28T08:31:09Z",
  "took": 13416254
}

The same of S3 gateway operations too, please (which include the name of the invoking user, at least):

{
  "file": "usr/local/go/src/net/http/server.go:2047",
  "func": "net/http.HandlerFunc.ServeHTTP",
  "host": "economy.us-east-1.lakefscloud.ninja",
  "level": "info",
  "log_audit": "API",
  "matched_host": false,
  "method": "HEAD",
  "msg": "HTTP call ended",
  "path": "amazon-reviews-repartition/_temporary/0/task_202209250821368514549904317969258_0004_m_002914",
  "ref": "test-overwrite2",
  "repository": "test-1",
  "request_id": "577f6a5c-0ce0-4142-9aef-6fb437603bba",
  "sent_bytes": 423,
  "service_name": "s3_gateway",
  "status_code": 404,
  "time": "2022-09-25T10:09:29Z",
  "took": 53151590,
  "user": "xxxxx"
}
a-Cash-dixit commented 2 years ago

please assign me for this issue and guide me what to do . Sir.

arielshaqed commented 2 years ago

Hi @a-Cash-dixit !

Thanks for taking this!

There are these 2 log messages emitted for every API operation (the first one) and for every S3 gateway operation (the second one). You can make lakeFS emit the first one by running any lakectl operation, and the second by running any aws s3 operation with the lakeFS S3 gateway as its endpoint (see the AWS CLI integration guide for how to configure this!). And each log message just points you t the line in the code that emits it.

Now the goal is to add to these 2 log messages the IP of the user. The Golang http.Server page should have at least one option for getting the source IP of a connection -- please find a good one (the more standard and simple the better!) and add that to the log.

Please do let me know how you get along -- we're here to help.

a-Cash-dixit commented 2 years ago

can you add Hacktoberfest tag to your repo. Sir.

arielshaqed commented 2 years ago

can you add Hacktoberfest tag to your repo. Sir.

Hi @a-Cash-dixit ,

Unfortunately this year we shall not participate in Hacktoberfest. I know that this will disappoint you as much (or even more) as it does me. So I want to explain why we made this difficult decision.

Previously we did participate in Hacktoberfest. This year we have elected not to do so. There are unfortunately two types of contributors attracted by Hacktoberfest: genuine ones like yourself, and (unfortunately!) bad-faith people who send spam PRs. It's easy to tell the spam PRs of course, and I can already see that you are not a spammer. Our worry is that as soon as I put the "hacktoberfest" tag on the repo, we will get just too many of them!

While the PRs like those you intend to send are great, in practice there are far more PRs sent by the second group of people. And as a small open-source project, the price paid to deal with all of those PRs is tremendous. Using this search you can find just a few of the PRs sent by people. They include many changes such as adding and removing a single comma from the docs. We were reviewing and closing more than 10 spam PRs per day on that Hacktoberfest.

So we regretfully decided not to participate in Hacktoberfest this year.

johnmantios commented 1 year ago

hi @arielshaqed! Can you assign this to me? I think I can give it a shot based on your comment from October 2 (also first contribution to lakeFS :) )