heistp / irtt

Isochronous Round-Trip Tester
GNU General Public License v2.0
185 stars 23 forks source link

Suprious DropBadMagic errors? #10

Closed tohojo closed 6 years ago

tohojo commented 6 years ago

I've had the irtt server running for a while now, but had not gotten around to opening up the firewall to the public internet. Looking at the logs, I get a bunch of DropBadMagic errors:

Nov 24 19:57:12 irtt[18073]: [DropBadMagic] bad magic: 265083 != 14a75b
Nov 24 19:57:17 irtt[18073]: [DropBadMagic] bad magic: 265083 != 14a75b
Nov 25 11:56:22 irtt[18073]: [DropBadMagic] bad magic: 84e783 != 14a75b
Nov 25 11:56:27 irtt[18073]: [DropBadMagic] bad magic: 84e783 != 14a75b
Nov 25 13:34:13 irtt[18073]: [DropBadMagic] bad magic: 6c0481 != 14a75b
Nov 26 01:00:10 irtt[18073]: [DropBadMagic] bad magic: eb4d83 != 14a75b
Nov 26 01:00:15 irtt[18073]: [DropBadMagic] bad magic: eb4d83 != 14a75b
Nov 26 08:12:12 irtt[18073]: [DropBadMagic] bad magic: dd7781 != 14a75b
Nov 26 08:12:17 irtt[18073]: [DropBadMagic] bad magic: dd7781 != 14a75b
Nov 26 18:01:26 irtt[18073]: [DropBadMagic] bad magic: c85d83 != 14a75b
Nov 26 18:01:31 irtt[18073]: [DropBadMagic] bad magic: c85d83 != 14a75b
Nov 27 15:23:20 irtt[18073]: [DropBadMagic] bad magic: bffb81 != 14a75b
Nov 27 15:23:25 irtt[18073]: [DropBadMagic] bad magic: bffb81 != 14a75b
Nov 28 02:33:05 irtt[18073]: [DropBadMagic] bad magic: f3db83 != 14a75b
Nov 28 02:33:10 irtt[18073]: [DropBadMagic] bad magic: f3db83 != 14a75b
Nov 28 04:24:19 irtt[18073]: [DropBadMagic] bad magic: 338383 != 14a75b
Nov 28 04:24:24 irtt[18073]: [DropBadMagic] bad magic: 338383 != 14a75b
Nov 29 04:22:56 irtt[18073]: [DropBadMagic] bad magic: 599483 != 14a75b
Nov 29 14:23:29 irtt[18073]: [DropBadMagic] bad magic: aae681 != 14a75b
Nov 29 14:23:34 irtt[18073]: [DropBadMagic] bad magic: aae681 != 14a75b
Nov 29 17:19:01 irtt[18073]: [DropBadMagic] bad magic: 355381 != 14a75b
Nov 29 17:19:06 irtt[18073]: [DropBadMagic] bad magic: 355381 != 14a75b

The odd thing is that no tests have been run at any of these times; so is there something else going on related to a long-running server?

The VmSize of the long-running process is 1196928 kB while a newly started server shows 383376 kB - but I don't know if it is growing over time or if its just because there have been a couple of tests run against the long-running server.

tohojo commented 6 years ago

Running a lot of simultaneous tests on the newly started server, I can get its VirtSize to increase (but its resident size stays constant). I did not see any of the DropBadMagic errors while running those tests, though.

heistp commented 6 years ago

If it's open to the Internet, could the DropBadMagics just be from random port scanners? My UDP OpenVPN server receives knocks every so often, or it did until I hardened it by requiring an hmac. Anyway, I just added to the list to log the source address with all drops.

Regarding the virtsize, I don't think it's unusual that that can grow and get pretty large, although there must be some limit. Related.

I didn't see any leaks on the profiler yesterday, but I do know of a few places where more garbage is created than I want, a couple things I'm doing and also the new code I'm calling in x/net that sets the source address puts more on the heap than I wish it would.

I have a couple related things to look at here in the next couple of days so will get back with more detail on this one...

tohojo commented 6 years ago

Pete Heist notifications@github.com writes:

If it's open to the Internet, could the DropBadMagics just be from random port scanners? My UDP OpenVPN server receives knocks every so often, or it did until I hardened it by requiring an hmac. Anyway, I just added to the list to log the source address with all drops.

Sure. Except it wasn't open to the internet (as I hadn't gotten around to that part yet). May be something probing it from the internal network, I guess.

tohojo commented 6 years ago

Aha, and fixing the comments makes systemd kill it because it uses a forbidden system call... :/

heistp commented 6 years ago

On Nov 29, 2017, at 7:08 PM, Toke Høiland-Jørgensen notifications@github.com wrote:

Pete Heist notifications@github.com writes:

If it's open to the Internet, could the DropBadMagics just be from random port scanners? My UDP OpenVPN server receives knocks every so often, or it did until I hardened it by requiring an hmac. Anyway, I just added to the list to log the source address with all drops.

Sure. Except it wasn't open to the internet (as I hadn't gotten around to that part yet). May be something probing it from the internal network, I guess.

Ah, I misunderstood. I don’t see those in the logs in my test bed, but I don’t know the reason. I can add the source address to the logs and we can go from there.

heistp commented 6 years ago

On Nov 29, 2017, at 7:06 PM, Toke Høiland-Jørgensen notifications@github.com wrote: Aha, and fixing the comments makes systemd kill it because it uses a forbidden system call... :/

Oops, not sure what you mean, what comments? I did fix two keys in irtt.service and irtt@.service today which were “ProtecyHome” rather than “ProtectHome”, but not sure if you’re referring to that…

tohojo commented 6 years ago

Pete Heist notifications@github.com writes:

On Nov 29, 2017, at 7:06 PM, Toke Høiland-Jørgensen notifications@github.com wrote: Aha, and fixing the comments makes systemd kill it because it uses a forbidden system call... :/

Oops, not sure what you mean, what comments? I did fix two keys in irtt.service and irtt@.service today which were “ProtecyHome” rather than “ProtectHome”, but not sure if you’re referring to that…

Erm, that comment was supposed to go to the pull request I thought I just opened...

heistp commented 6 years ago

Remote address, when available, is now logged with all server events:

tron:~/src/github.com/peteheist/irtt:% ./irtt server -hmac test
IRTT server starting...
[ListenerStart] starting IPv6 listener on [::]:2112
[ListenerStart] starting IPv4 listener on 0.0.0.0:2112
[DropNoHMAC] [127.0.0.1:51813] no HMAC present
tohojo commented 6 years ago

Pete Heist notifications@github.com writes:

Remote address, when available, is now logged with all server events:

tron:~/src/github.com/peteheist/irtt:% ./irtt server -hmac test
IRTT server starting...
[ListenerStart] starting IPv6 listener on [::]:2112
[ListenerStart] starting IPv4 listener on 0.0.0.0:2112
[DropNoHMAC] [127.0.0.1:51813] no HMAC present

Great, thanks! I updated the binary on my server, will see if I get the log entries again...

heistp commented 6 years ago

Ok, sorry for the double commit, but when you get a chance update again regarding heap allocations. I've optimized a few things so there are zero allocations from me on the server's normal reply path (errors not included).

Also, I was getting hit by the new set source IP functionality. In the two attached profiles, after 75000 packets, setting the source IP results in 41 megs of heap allocations and not setting the source IP results in 8 megs (in both cases almost all of it as side effects of calling send and receive). Therefore, that functionality is now disabled by default and can be enabled with the -setsrcip flag on the server, which when enabled, still only sets the source IP on listeners with unspecified IP addresses, because that's the only time it's necessary. I'd only use it if you're seeing return packet routing problems like I was.

There are no leaks in either case detected by the profiler when GC is enabled, so AFAIK the virtsize increase is not the result of a program leak. What causes it though, I'm not sure. :)

no_setsrcip setsrcip

heistp commented 6 years ago

Just looking at this again today, I ran a test all night last night repeating: irtt client -i 1ms -d 60s -q 10.9.0.2, and saw nothing like the numbers you saw. After a few minutes when memory seemed to have stabilized:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 
 4616 irtt      20   0   11300   7144   2492 S  17.5  0.2   1:01.36 irtt   

After the whole night it was exactly the same:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 
 4616 irtt      20   0   11300   7144   2492 S  17.2  0.2  99:18.55 irtt 

What I'm trying right now is a tougher test of:

while true; do
    irtt client -i 1us -d 1s -qq -n 10.9.0.2
    irtt client -i 1us -d 1s -qq 10.9.0.2
done

After stabilizing:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                  
 5462 nobody    20   0   11236   6788   2364 S  92.7  0.2   2:49.78 irtt  

After 20 minutes or so nothing has moved. I'll let it run for the day...

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                  
 5462 nobody    20   0   11236   6788   2364 S  91.7  0.2  16:37.68 irtt                     

So either I fixed something or there's a code path I'm not exercising that still has a problem. Any other update on it?

tohojo commented 6 years ago

So either I fixed something or there's a code path I'm not exercising that still has a problem. Any other update on it?

Well, I have seen neither the error messages nor the high memory use this time around. But I haven't run any tests either; guess it's time to open up the firewall and expose irtt to the world :)

-Toke

heistp commented 6 years ago

This Cadillac hasn’t stalled in a week, time to drive to Vegas! :)

I just set up a server on a raspi and will open it to the outside world during the next "maintenance window". I guess it shouldn’t be a security risk, but it’s still easier to DoS the server than it should be. I’m plodding along towards some changes to improve that.

Funny that right away before any load the virt size is higher on this raspi, although it doesn't budge at all with the same script I ran before and res is still low so I guess I'm not worried about it:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                 
 1225 nobody    20   0  793248   3616   2200 S  92.5  0.4   0:36.22 irtt                    

uname -a on my box where virt size stays around 11-12M:

Linux apu2a 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 GNU/Linux

uname -a on my box where it stays around 793M:

Linux luke 4.9.35-v7+ #1014 SMP Fri Jun 30 14:47:43 BST 2017 armv7l GNU/Linux

How about uname -a on the box where you saw the high usage?

tohojo commented 6 years ago

Pete Heist notifications@github.com writes:

How about uname -a on the box where you saw the high usage?

Linux hostname 4.14.2-2-ck-ivybridge #1 SMP PREEMPT Sat Nov 25 09:33:12 EST 2017 x86_64 GNU/Linux

-Toke

heistp commented 6 years ago

Ok thanks, a newer kernel.

On my EdgeRouter-X with 3.10.14 though, virt size also sits at 645 megs:

Linux erx 3.10.14-UBNT #1 SMP Wed Aug 30 02:34:16 PDT 2017 mips GNU/Linux

but res is two megs. I'm going to close this as I don't think there's anything more I can do here. At the moment, I'm not concerned unless we'd see other problems or resident size go up.

As for the packets you saw, port 2112 is also reserved for something called Idonix Metanet, which who knows what that is, but maybe it's still in use somewhere and could explain your bad magics? Also the timestamps on those were curiously somewhat regular, with 5 second gaps between successive packets. I've been thinking about whether to ask for an IANA reservation or moved to an unreserved port range, but since other services out there seem to use old reserved ports that's what I've gone with so far...