ntop / n2n

Peer-to-peer VPN
GNU General Public License v3.0
6.07k stars 927 forks source link

[Windows] Supernode crashes when reloading communities #1092

Open shaxxx opened 1 year ago

shaxxx commented 1 year ago

When sending reload command to supernode managment port running on Windows supernode crashes. Command from the client

echo w 1:1:n2n reload_communities | nc64 -w1 -u 127.0.0.1 5645

results with output

{"_tag":"1","_type":"begin","cmd":"reload_communities"}

I've got same result with precompiled binares from @lucktu and compiling from current tree. My C foo is weak, but with bunch of tracing messages I was able to pinpoint the exact line causing the problem.

https://github.com/ntop/n2n/blob/39b9c6b1c0ac19d783cf7c48541b5bd924387c4b/src/sn_utils.c#L282

I've tried commenting out this line but next line also crashes supernode. If I comment out lines 282-285, to skip freeing struct members everything works. I wasn't able to reproduce it on linux.

Logan007 commented 1 year ago

Not sure what's going on there because I think all community keys are allocated and freed at the same time... but could you try to guard each single free with a corresponding if statement, like

        // remove header encryption keys
        if(NULL != comm->header_encryption_ctx_static) 
            free(comm->header_encryption_ctx_static);
        if(NULL != comm->header_iv_ctx_static) 
            free(comm->header_iv_ctx_static);
        if(NULL != comm->header_encryption_ctx_dynamic) 
            free(comm->header_encryption_ctx_dynamic);
        if(NULL != comm->header_iv_ctx_dynamic) 
            free(comm->header_iv_ctx_dynamic);
        }

and let us know if it works better?

shaxxx commented 1 year ago

I already tried it, not on all members but only first header_encryption_ctx_static and it still crashed. IMHO problem might be uninitilized struct members. uninitilized != NULL Calling free on uninitilized struct members is undefined behaviour, maybe initializing members to null by default would help. But like I said my C foo is weak, so I might as well be talking nonsense.

Logan007 commented 1 year ago

I'd have to check but afair it's all zeroed...

Logan007 commented 1 year ago

Also, could you give us a hint about what kind of communities you use please (from file/not from files/mixed/header-encrypted/not-header-encrypted)?

shaxxx commented 1 year ago

I'm out of office, will paste it here later. In the meantime, nothing fancy, all parameters are from file, as far as I remember it's -p 7654 -F cloud -c community.list -v

And my comunity list us just one line with community name without regex, without users, without subnet

shaxxx commented 1 year ago

As promised

supernode.conf

-c community.list
-F cloud
#--management-password test
-vvvvv

community.list integrator

Started as supernode.exe supernode.conf

I also tried playing with other options without any effect, it crashes 100% of time.

shaxxx commented 1 year ago

Loaded community doesn't have members we're trying to free, here's the debug output from gdb. While federation and community_file are recognized header_encryption_ctx_static and the rest are not. Also, I forgot to mention I was doing all my tests without any connected nodes, maybe that helps.

gdb

shaxxx commented 1 year ago

Well, it was late and I made a boo boo, I was debuging supernode type, not current community. So, the actual problem seems to be incomplete type definition. gdb2

shaxxx commented 1 year ago

Ok, I went further down the rabbit hole, after getting decent debug environment I can see that process dies with misaligned_access

gdb3 I can see that _mm_alloc is used when SPECK_ALIGNED_CTX is defined, otherwise it's always plain malloc.

So crashing makes sense since

Memory that is allocated using _mm_malloc must be freed using _mm_free . Calling free on memory allocated with _mm_malloc or calling _mm_free on memory allocated with malloc will cause unpredictable behavior.

In my case SPECK_ALIGNED_CTX is defined to 16 since SSE2 is defined HERE

Running gcc -msse3 -dM -E - < /dev/null | egrep "SSE|AVX2|AVX512F|ARM_NEON" | sort on my Windows machine outputs

define MMX_WITH_SSE 1

define __SSE2_MATH__ 1

define SSE2 1

define SSE3 1

define __SSE_MATH__ 1

define SSE 1

If I'm not defining SPECK_ALIGNED_CTX reload communities command does not crash supernode.

So my question is why are we defining it to 16, and what it does? Is it safe to not define it?

Logan007 commented 1 year ago

Good catch!

16 is key context size I think (all round keys).

The alignment is a requirement (only) for the MMX/SSE/(AVX) hardware accelerated versions of the SPECK cipher used for header encryption. Linux obviously can free those with free() as well. So we might need to differentiate the different free cases then to have Windows not crashing the supernode.