zerotier / libzt

Encrypted P2P sockets over ZeroTier
https://zerotier.com
Other
184 stars 54 forks source link

malloc/free issues in latest release on linux #208

Closed janvanbouwel closed 1 year ago

janvanbouwel commented 1 year ago

I've been trying to integrate libzt in a project, but I have issues when trying to use the latest versions. Version 1.8.4 seems to work fine. Version 1.8.10 crashes the moment I try to do anything network related, like pinging the program or creating a socket. Very rarely I am able to ping it, in which case it works for at most a minute before still crashing. The message I get is one of the following two. (free() has also been mentioned, but not as often, and I don't remember the exact message.)

malloc(): invalid size (unsorted)
malloc(): corrupted top size

This happens in exactly the same way with both Rust and Java bindings. My system is Linux 5.15.0-58-generic #64-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux, running Ubuntu 22.04.1 LTS.

The breaking commit appears to be 0bf302a581f641b1ac5a1915e4cc59cde4ae8cdb, which updates the ZTO submodule to v1.8.10. Before that it works. Later commits and later versions don't fix the problem.

For now I can continue using v1.8.4, but I was considering contributing and that's difficult if I can't try the latest version.

tejas238 commented 1 year ago

I am having the same trouble on linux armv7l infrastructure with java code that was working previously on 1.8.4. For now, neither of 1.8.4 and 1.8.10 is working for me and the moment I send listener.connect from one of the clients the server crashes with one of the malloc() errors reported above. I am not sure if the submodules from 1.8.10 were used on 1.8.4 by mistake when I switched the branch. I will try a fresh build using 1.8.4 and report my results. Thanks for reporting this!

janvanbouwel commented 1 year ago

I did a git bisect on the ZTO submodule. (Found the free error again, and another one:

double free or corruption (!prev)
corrupted size vs. prev_size while consolidating

)

In some revisions the errors seemed to be less consistent (say 1 in 2 instead of 5 in 6 runs), but the first breaking commit appears to be https://github.com/zerotier/ZeroTierOne/commit/e9375b50b0b7228f77dc8ef3d99d1021f89675ff.

I don't have time right now to look at it more, but it's a start.

tejas238 commented 1 year ago

I confirmed today that using the v1.8.4 submodule fixed the issue for me which is weird because v1.8.10 was working for me in November which seems to be after the submodule was updated to v1.8.10 last May. Hoping there are no breaking fixes to the java binding since then which could surface later on for me...

janvanbouwel commented 1 year ago

Promising lead:

The aforementioned commit https://github.com/zerotier/ZeroTierOne/commit/e9375b50b0b7228f77dc8ef3d99d1021f89675ff sets

https://github.com/zerotier/ZeroTierOne/blob/e9375b50b0b7228f77dc8ef3d99d1021f89675ff/include/ZeroTierOne.h#L166

#define ZT_MAX_PEER_NETWORK_PATHS 64

while in libzt there is:

https://github.com/zerotier/libzt/blob/8d21a265cc23dd6e6e4d2c2ad068e978f110f8e3/include/ZeroTierSockets.h#L341

This seems to be a discrepancy (although they have a slightly different name). I'll test in about 2 hours whether setting the latter to 64 fixes the issue.

janvanbouwel commented 1 year ago

Good news, not tested super extensively but this seems to fix it, at least for Java!

For Rust I assume the same change has to be made, in:

https://github.com/zerotier/libzt/blob/8d21a265cc23dd6e6e4d2c2ad068e978f110f8e3/pkg/crate/libzt/src/include/ZeroTierSockets.h#L341

For Java, rather confusingly, this is also included in https://github.com/zerotier/libzt/blob/8d21a265cc23dd6e6e4d2c2ad068e978f110f8e3/src/bindings/java/ZeroTierNative.java#L410 but the only reference to it is https://github.com/zerotier/libzt/blob/8d21a265cc23dd6e6e4d2c2ad068e978f110f8e3/src/bindings/java/ZeroTierPeerDetails.java#L60 and this class doesn't seem to be used.


I have found three other inconsistencies in ZeroTierSockets.h

The first two: https://github.com/zerotier/libzt/blob/8d21a265cc23dd6e6e4d2c2ad068e978f110f8e3/include/ZeroTierSockets.h#L327-L336

In https://github.com/zerotier/ZeroTierOne/commit/440568a5165181c5ae602e4edca2a73202bfd453 these were changed to 128 and 32 respectively.

And https://github.com/zerotier/libzt/blob/8d21a265cc23dd6e6e4d2c2ad068e978f110f8e3/include/ZeroTierSockets.h#L347

which is set to 4096 in ZeroTierOne, although this one has been there since it was introduced, and may be intentional.

As far as I'm aware I've not run into issues with these, but they could form a problem.


I'm gonna test a bit more (and I'll try to get to the Rust version as well), and if it works I'll set up a PR.

It does seem strange to me that these constants are redefined and the ones in https://github.com/zerotier/ZeroTierOne are not used, but I'm not that skilled at C to know the reasoning behind it. Maybe a project maintainer could shine some light on this?

janvanbouwel commented 1 year ago

which is weird because v1.8.10 was working for me in November

As it is a memory issue, maybe your network topology changed?

joseph-henry commented 1 year ago

Yeah these constants should match. The fact that they currently don't is just that I must have forgotten to update them in this project when we bumped up the number of allowed paths in ZeroTierOne. I'll see about either updating them or just using the regular ZeroTierOne constants.

The reasoning was that we wanted to use a smaller subset of the definitions from ZeroTierOne.h since may of them shouldn't be exposed alongside libzt's API

joseph-henry commented 1 year ago

I'm going to close this issue since your PR addresses this. Feel free to open again if the issue persists.