implement and use log with ELK

tracking-exposed / facebook

facebook.tracking.exposed - collaborative tool for algorithm investigation

https://facebook.tracking.exposed

GNU Affero General Public License v3.0

113 stars 39 forks source link

implement and use log with ELK #98

Open vecna opened 5 years ago

vecna commented 5 years ago

This is a multipurpose issue, in collaboration with @joxer

test the sending of messages to the ELK system log.tracking.exposed
testing the visualization of the ELK system
integrate the logging in different components of the pipeline.

edit, at the end of the issue, we should have these logs working:

[x] adopters handshake
[x] events received by web-extension
[ ] page navigation info
[x] parsers errors
[x] parsers statistics on metadata extraction
[ ] mongodb auditlog
[x] anomalies (they require investigation before they expires)

joxer commented 5 years ago

@vecna which log messages have priority? I would like to migrate them. Please make a list of them so I can understand which have the top priorities.

vecna commented 5 years ago

to begin with:

20115 ۞  ~/Dev/facebook elastic='http://log.tracking.exposed:9200' DEBUG=* node test/sendLogMsg.js 
  tests:sendLogMsg The configuration of elastic is "http://log.tracking.exposed:9200" +0ms
  tests:sendLogMsg Sending message with `id` 1546430226 and `field` random-value-8611 +

I would expect to see an entry, so I can start to play with this log and viz and understand how/what can be customized in the log message

joxer commented 5 years ago

Ok, Elasticsearch can be configured to have a fixed or a dynamic value log I will create a small documentation in the issue

joxer commented 5 years ago

If you go to support/elasticsearch/setup.sh you can see the creation of fields in elasticsearch. At the moment I've created 2 different indexes:

fbtrex_mongo
fbtrex_users

These two have their own values that are fixed and are then used in kibana to structure data. Next to each field's name, there's the type of the field. If dynamic is set you can assume it's a JSON and it will be accepted as it arrives.

https://github.com/tracking-exposed/facebook/blame/0643bf044f55db324b3417ff70a7b32912ed8d17/lib/events.js#L220

As you can see here I pass the value to index fbtrex_users formatted for the index. So, wat I would suggest is to first understand meaningful data we want to collect, create the index for this and manage the collection of it.

vecna commented 5 years ago

I've committed the file summary.json which has a non-working-format; the issue are in regards of a list and a collection of data. If you see the image below that's how a data snippet looks like:

vecna commented 5 years ago

I marked adopters handshake because of this new entry: https://github.com/tracking-exposed/facebook/commit/90e3bb9501f4e9422b6b03e730052146c8831365#diff-244b34bd6229d2d67de9d5ad41669d6a @joxer, this basic log is intended to see the versions of the extensions and the frequency of update

vecna commented 5 years ago

@joxer a question: in the first usage of ELK, you made the id unique as a timestamp, but some other fields are 'dates'.

In the stats from the parsers results, I want to visualize the aggregated results hour by hour. Is this aggregation possible in both the cases? should we use an UNIX timestamp like or the ISO timestamp?

I implemented both the options in the commit "parser log experiments #98"

joxer commented 5 years ago

@vecna the id is the unique ID of the elastic search data. I think I'm wrong about:

Math.round((new Date()).getTime() / 1000)

We should use milliseconds, otherwise we can use a random UUID. The code you are referring to at:

https://github.com/tracking-exposed/facebook/commit/de5f4c54520c860402b5664e85f65cef007776b5#diff-fbbf73cfc5b6337122c2a5bb8c0d1b4fR71

Is correct and we can do aggregation later in Kibana

vecna commented 5 years ago

Ok thanks, then, what about we move the id composition in the library? also, I'll change it in an hash of milliseconds + index name, to guarantee uniqueness

joxer commented 5 years ago

yes that's ok @vecna

vecna commented 5 years ago

@joxer I changed all the .json with:

the id and when
uniformed the fields name with the same of the DB

when you are ready to test it, I am!

vecna commented 5 years ago

Added the semantic analysis; had this warning message:

  support:elasticsearch:initialize index posted: {
  "statusCode": 200,
  "body": "{\"acknowledged\":true,\"shards_acknowledged\":true,\"index\":\"semantics\"}",
  "headers": {
    "warning": "299 Elasticsearch-6.4.2-04711c2 \"the default number of shards will change from [5] to [1] in 7.0.0; if you wish to continue using the default of [5] shards, you must manage this on the create index request or with an index template\" \"Sat, 19 Jan 2019 11:05:30 GMT\"",
    "content-type": "application/json; charset=UTF-8",
    "content-length": "68"
  },

joxer commented 5 years ago

@vecna that's ok. It means there's only one server and we should add others. As long as we just play we can keep 1 elastic search server.