traviscross / mtr

Official repository for mtr, a network diagnostic tool
http://www.bitwizard.nl/mtr/
GNU General Public License v2.0
2.64k stars 337 forks source link

mtr: Unexpected mtr-packet error (No buffer space available) #418

Open bramp opened 2 years ago

bramp commented 2 years ago

mtr has previously worked great for me, but in the last week it will fail within minutes with the error mtr: Unexpected mtr-packet error

Example:

$ sudo mtr 8.8.8.8
... happily displays results
mtr: Unexpected mtr-packet error
$

It is not clear to me what triggers it to fail. I packet captured, and saw no unusual ICMP/packets that would cause this. I can pretty reliability reproduce this within a minute of running the command.

$ mtr --version
mtr 0.94

$ uname -a
Darwin bramp-macbookpro 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64 x86_64

$ sw_vers
ProductName:    macOS
ProductVersion: 12.1
BuildVersion:   21C52

I'm happy to try and debug this.

rewolff commented 2 years ago

I searched for the string, and mtr-packet got an unexpected error and has passed that on to the main program.

If you comment out the line 652 in cmdpipe.c then it shouldn't stop. On the other hand the code after line 688 will assume the program has quit when there has been an error. Ah. A few lines down it will be ignored.

This should tell us if the error is a really fatal one. Does it occasionally go wrong or has something gone bad and it will never recover?

On line 652 if you put a fprint(stderr,...f of reply->argument_value[0] that might give a hint as to what's going on.

bramp commented 2 years ago

Sorry for the slow response it, I compiled the code, adding additional debugging statements... However since then, I've been unable to reproduce the mtr: Unexpected mtr-packet error issue. I'd be happy to send a PR to more permanently log more useful information in this situation.

rewolff commented 2 years ago

No worries.

Outputting the error code would at least give us a hint when things go wrong.

One of my frustrations is that windows (at least at one point in time) said "could not display page" when something whent wrong with displaying a web page. That could be "out of memory while rendering the page" or "your interface is down" or "no route to host" or "connection refused at the destination". Each, if that had been the error message, would elicit a different response in trying to fix the problem.

Linux is usually a lot beter giving a hint as to what's wrong. mtr should try not to deviate from the pattern. :-)

More useful information would be better, yes a PR would be appreciated.

bramp commented 2 years ago

Ok, finally after days of trying to repeat this, I got one error "errno = 55" (No buffer space available). I don't know if that's my original problem, or a new one, but it's at-least an unhandled error. A quick google search, shows this seems to be some weird condition with non-blocking UDP requests on Macs.

rewolff commented 2 years ago

So hypothesis: When you request "nonblocking" IO the kernel takes that very literally and WILL NOT block say to allocate some memory. (The "please don't block" is passed down to the memory allocation function, and that one says: "sorry, but can't help you today under that restriction". )

So a fix would be to ignore that error unless it happens too often. say: if (errno == 55) { errstatus = (errstatus * 9) / 10;errstatus += 100; if (errstatus > 500) ... pass the error on causing mtr to exit} The trouble is to do the accounting correctly. Maybe report to MTR: Sorry packet didn't get sent. so that the mtr-accounting part doesn't count this as "failure on the link to that host".