status-im / status-go

The Status module that consumes go-ethereum
https://status.im
Mozilla Public License 2.0
728 stars 246 forks source link

Make it possible to analyse which messages were lost or resent in the test environment #325

Closed tiabc closed 6 years ago

tiabc commented 7 years ago

Problem

We're partly a messenger and there shouldn't be situations where messages are lost or delivered for a long time.

@annadanchenko is doing internal sessions "test Status as user" and sometimes we face weird problems with delivery and there's no way to find an internal problem during these sessions or after them. What makes things worse is that when anybody comes across a problem at any time, they have no way to give us background for investigation.

  1. This gives us no possibility to find a problem.
  2. Keeps problems unsolved for a long period of time.
  3. Making things worse, we don't even know on which side the problem is: status-react or status-go and can't reproduce anything outside these test sessions or working separately.

The goal is to at least start from writing logs and analysing them after test sessions to find missing messages and resends.

Implementation

  1. Log all incoming and outgoing whisper messages along with their unencrypted payload on status-go side.
  2. Create a small log analyser which consumes multiple log files and draws tables for every device with incoming and outgoing messages (only for accounts presented in the communication session).

Acceptance Criteria

  1. All incoming and outgoing whisper messages are logged on every device participating in the communication session.
  2. These logs are readable by human and easily parseable by machine.
  3. There's a tool which consumes log outputs from different devices and draws representative tables out of which we can figure out which messages were sent but not received and how many were resend. Payload of each message (mostly, status-react's) should also be available for viewing.

Notes

As whisper is part of go-ethereum, they may already have a similar instrument.

There's a connected issue about shared logs: https://github.com/status-im/status-go/issues/171 and a PR about remote logging: https://github.com/status-im/status-go/pull/173 Maybe, remote logging is the way we should follow.

This issue will get even more important when we switch to the double ratchet protocol (issue is not created yet).

Future Steps

  1. Create an issue on status-react side to provide easy interface to grab these logs and log their messages the same way.
  2. Address a problem
tiabc commented 7 years ago

Increasing priority as we can't properly debug and improve messaging without rigorous logging.

influx6 commented 7 years ago

I think we need to take a step back and reason about one very important bit of detail: Log Format.

We had an interesting discussion on log levels in https://github.com/status-im/status-go/issues/295, which helps us reduce alot of confusion when dealing with tomany levels without adequate concerned as to their real value.

I would like to propose another change for logging to use structure logging, where logs are not just sentences with string values chipped in but real json payloads which can be represented out as strings on stdout, but provide adequate readiblity and structure for higher services like instabug. (see related lib https://medium.com/@tjholowaychuk/apex-log-e8d9627f4a9a)

I personally find structured logging more suitable for our needs. We practically have alot to track and alot information can not really be expressed through a log message nor do i find it readable to dump a stack along a string message, using structures we can adequately include necessary pieces of data and if possible, create structures connected by a signature that can pass through length function calls down the stack, recording needed bits of data.

I personally have a small structure logger most suited to my needs but i believe apex provides a larger area when it comes to endpoints for where logs can be stored (from files to services).

tiabc commented 7 years ago

Please, join the discussion in #295. However, you can prototype or design a solution for this issue without discussing log format beforehand. Just propose a solution and mention what log format would be required to achieve it.

influx6 commented 7 years ago

My discussion was not just about log format but as part of overall refactoring of the current logging system. Will provide a prototype and we can review that.

influx6 commented 7 years ago

@tiabc @adambabik Would appreciate pointers to areas where message delivery with whisper is tracked and logs. I need to come up with specific standards about what flags to use to represent the different state of message transition which we can then use to track message re-delivery and losses.

I currently have a baseline in log-refactoring_325, which contains the new log structure, am currently going through all source and changing log lines, once am done with this i want to do the following:

adambabik commented 7 years ago

This PR is huge. Have you discussed it with @tiabc? Acceptance criteria in this issue look much simpler.

As far as I know, we don't have any insight into incoming Whisper messages in status-go. Have you tried to research where it happens in go-ethereum?

I think the first step would be to make sure that we actually log outgoing and incoming messages to stdout using the current logging mechanism. When we have this, we can think of a log processing tool.

influx6 commented 7 years ago

@adambabik The reason why i believe the initial steps is a redifnining of how we log is the following:

  1. We currently log just text lines with a few variable values tugged in
  2. We have no means of later looking into this output and miss other pieces of data that might be used e.g the fields of incoming Message struct and their relevant content or the data sent along the ssh_post to figure out why differs from the expected look (many times have had to log or fmt in go-ethereum to figure out that it was a quoted value or misrepresented number that was the bug.

We need to change things abit, for example, if we can do structured logging, we can improve what data actually runs along with our text lines, more so, we have the added benefit of writing tools who can take this data to provide far more indept view of what is going on and the objects being passed around, which would be helpful on areas as mentioned above.

Beyond this, we will have a clear insight as to what values are entering functions and what results are being sent out, modifying go-ethereum to do this wont be hard, and will allow us utilize more tools to use our logs far more evidently to view the current code and how it works, of course, we need to log meaningful things for this, but that is naturally a given.

As to having talked to @tiabc, this is why i mentioned these details here for dicussion, so everyone can read my thoughts and weigh in, currently i have a small piece that outputs most logs into json lines into files, but we can take this steps further, by writing tools that deliver such log lines in realtime to either a server we build ourselves to generate dashboard details or more.

The difference here is we have the larger ability to create meaningful insights with a structured logging system that the current mechanism, more so, it really won't change much in the code style.

adambabik commented 7 years ago

I think some examples of such structured logging could visualize better the idea. All this sounds good but it's a pretty huge effort from what I can see. At the same time, it still won't resolve (1) from the AC and I guess it is the highest priority.

My plan would be to use the current logging system and make sure we solve (1). When we start seeing Whisper logs, we can think about how to analyze them.

influx6 commented 7 years ago

@adamb Understood!

influx6 commented 6 years ago

@adamb Am away on my break this week, but as to what usage of the new log will look like, here is a sample from my own package:

https://github.com/influx6/moz/blob/master/ast/ast.go#L266-L270

log.Emit(metrics.Info("Annotation in Decleration comment").
    With("dir", dir).
    With("annotation", item.Name).
    With("position", rdeclr.Pos()).
        With("token", rdeclr.Tok.String()))

Output is easily customizable, here is the format i currently print them in:

Message: Annotation in Decleration
- Type: "Struct"
- Annotations: 0
- StructName.namepos: 1069
- StructName.name: "MofInitableImpl"
- StructName.obj.kind: 3
- StructName.obj.name: "MofInitableImpl"

See more https://github.com/influx6/faux/tree/master/metrics

adambabik commented 6 years ago

The usage looks nice. Is there a way to print it in one line? Multiline logs are pretty hard to be parsed by external tools.

influx6 commented 6 years ago

@adamb Yes, you can customize the print out as you see fit, more so, external tools should parse the files which will be pure json which is the big advantage we have with this.

see https://github.com/influx6/faux/blob/master/metrics/jsonfile/jsonfile.go

As to how customizable it is, see https://github.com/influx6/faux/blob/master/metrics/custom/custom.go (I have two customizable display functions for my personal liking).

influx6 commented 6 years ago

@tiabc @adamb For the sake of clearification and to ensure am not running the wrong way.

I understood the messages we wished to track delivery and queue for where Whisper related, hence i focused on making changes to the ff:

But while looking more into the goethereum code, am having a disconcerting feeling that the message state @tiabc referred to would probaby be in les, like:

So before i make more changes to i need to validate my direction.

Your thoughts would be heavily appreciated.

adambabik commented 6 years ago

I was looking into whisper yesterday and I found out that delivered messages are handled here: https://github.com/status-im/go-ethereum/blob/status/1.7.0-unstable/whisper/whisperv5/whisper.go#L582

And this method adds a message to the queue: https://github.com/status-im/go-ethereum/blob/status/1.7.0-unstable/whisper/whisperv5/whisper.go#L649 but it would be great to figure out where the actual processing happens.

influx6 commented 6 years ago

@adambabik I think i can help a little with that.

Filters:

The filters set on whisper receive non-p2p messages (unless it's allowed to process these) for those who wish to listen for them through their topics, See:

Of course the main bit is in https://github.com/status-im/status-go/blob/963fda581e984826b6278b1f2dc570243ebf2f28/vendor/github.com/ethereum/go-ethereum/whisper/whisperv5/filter.go#L91-L125.

But the main issue is, whisper supports two message thats: Envelopes for filters and Envelopes for p2p, although the filter can receive notification for p2p envelopes, but they are managed differently.

Like here https://github.com/status-im/status-go/blob/963fda581e984826b6278b1f2dc570243ebf2f28/vendor/github.com/ethereum/go-ethereum/whisper/whisperv5/whisper.go#L238-L250.

I think, the p2p might be abit more hard to wrap around to track, but shouldn't be too much of a problem.

More so, p2p messages are delivered through the MailService, see:

influx6 commented 6 years ago

@adambabik @tiabc Ok. Having runned abit deeper, filtered messages once received are processed by perodically checking the Filter.ReceivedMessages to be delivered to subscribers,

see:

Each creates a Filter then wraps a goroutine, that periodically checks for messages, this way Envelopes are received and delivered to the subscribers of those filters. Although each case show diffferent means, the logic is rather the same.

influx6 commented 6 years ago

@tiabc @adambabik Currently in the last commit 8b2892c, i have this markers in areas where whisper messages are processed, such as:

The only area yet for me to fully grasp on how it plays in with p2p message is here https://github.com/status-im/status-go/blob/feature/log-refactoring_325/vendor/github.com/ethereum/go-ethereum/whisper/mailserver/mailserver.go#L117-L155.

Your thoughts are seriously appreciated.

adambabik commented 6 years ago

@influx6 not sure about the first one. As you can see, a line below it returns an error, because the message was created "in the future". Unless you mean to add a log inside sendDelivery() method.

The second one looks good. It's the place where a message was actually matched and passed to a watcher.

Btw. it looks like with --log TRACE we should be able to see Whisper logs already.

I think sendDelivery() is a place where we can log all outgoing messages. Regarding incoming ones, the second link looks ok, unless we also want to log not matched messages.

Can you try to add logs in these places, run two nodes, exchange some messages and post result here?

influx6 commented 6 years ago

@adambabik As discussed in slack, for now we will skip monitoring of incoming messages.

With commit 5bcab591ed2a50f104882f39298c3d6e8a417c10, have made some changes as we discussed and target api.go in goetherem/whisperv5.

Although things do look simply and i wish to focus on the parsing tool next, i do think we might have to figure out a way to maybe collecting the ReceivedMessage with or instead of the Envelope, due to the node information attached. Though this is not critical but it will be revisited at a later time.

tiabc commented 6 years ago

Although it's important but doesn't significantly block anyone, so decreasing priority.