kubernetes-client / c

Official C client library for Kubernetes
Apache License 2.0
144 stars 46 forks source link

double free occurs while calling api in the multi-thread #54

Closed clearday4 closed 3 years ago

clearday4 commented 3 years ago

There is a problem that double free crashes while calling CoreV1API_listNamespacedPod() continuously in the Multi-thread. The direct cause is that the duplicated detection of v1_pod_free()'s address, it seems that cJSON is basically thread unsafe.

https://github.com/DaveGamble/cJSON/issues/221 https://github.com/DaveGamble/cJSON/issues/314

Looking at the above link, If you don't specify return_parse_end in cJSON_ParseWithOpts or use cJSON_GetErrorPtr (), it cause memory violation in the multi-thread.

And we need an additional review of why v1_pod_free() accesses duplicate addresses per thread. We got the following logs using address sanitizer.

================================================================= ==1659378== ERROR: AddressSanitizer: attempting double-free on 0x600800040390:

0 0x7ffff4e60dd9 (/usr/lib/libasan.so+0x15dd9)

#1 0x7ffff18777e8 (/pkg/lib/libkubernetes.so.xcg+0xc47e8)

0x600800040390 is located 0 bytes inside of 40-byte region [0x600800040390,0x6008000403b8) freed by thread T146 here:

0 0x7ffff4e60dd9 (/usr/lib/libasan.so+0x15dd9)

#1 0x7ffff18715de (/pkg/lib/libkubernetes.so.xcg+0xbe5de)

previously allocated by thread T147 here:

0 0x7ffff4e60ef9 (/usr/lib/libasan.so+0x15ef9)

#1 0x7ffff1876100 (/pkg/lib/libkubernetes.so.xcg+0xc3100)

Thread T147 created by T0 here:

0 0x7ffff4e55a0a (/usr/lib/libasan.so+0xaa0a)

#1 0x45db8d (/pkg/bin/nrdhcpad.xcg+0x45db8d)
#2 0x434238 (/pkg/bin/nrdhcpad.xcg+0x434238)
#3 0x43ce65 (/pkg/bin/nrdhcpad.xcg+0x43ce65)
#4 0x4742e6 (/pkg/bin/nrdhcpad.xcg+0x4742e6)
#5 0x474cf2 (/pkg/bin/nrdhcpad.xcg+0x474cf2)
#6 0x431352 (/pkg/bin/nrdhcpad.xcg+0x431352)
#7 0x4080aa (/pkg/bin/nrdhcpad.xcg+0x4080aa)
#8 0x505ce9 (/pkg/bin/nrdhcpad.xcg+0x505ce9)
#9 0x40f8e5 (/pkg/bin/nrdhcpad.xcg+0x40f8e5)
#10 0x405015 (/pkg/bin/nrdhcpad.xcg+0x405015)
#11 0x7fffef6e0554 (/usr/lib64/libc-2.17.so+0x22554)

Thread T146 created by T0 here:

0 0x7ffff4e55a0a (/usr/lib/libasan.so+0xaa0a)

#1 0x45db8d (/pkg/bin/nrdhcpad.xcg+0x45db8d)
#2 0x434238 (/pkg/bin/nrdhcpad.xcg+0x434238)
#3 0x43ce65 (/pkg/bin/nrdhcpad.xcg+0x43ce65)
#4 0x4742e6 (/pkg/bin/nrdhcpad.xcg+0x4742e6)
#5 0x474cf2 (/pkg/bin/nrdhcpad.xcg+0x474cf2)
#6 0x431352 (/pkg/bin/nrdhcpad.xcg+0x431352)
#7 0x4080aa (/pkg/bin/nrdhcpad.xcg+0x4080aa)
#8 0x505ce9 (/pkg/bin/nrdhcpad.xcg+0x505ce9)
#9 0x40f8e5 (/pkg/bin/nrdhcpad.xcg+0x40f8e5)
#10 0x405015 (/pkg/bin/nrdhcpad.xcg+0x405015)
#11 0x7fffef6e0554 (/usr/lib64/libc-2.17.so+0x22554)

==1659378== ABORTING

==2107129== ERROR: AddressSanitizer: attempting double-free on 0x6004003171b0: [0319|10:26:06.251|NRDHCPAD|DHCPAD|0180|INFO|8900][ERR]dal_db_connect fail(-26, Unknown error) [2021-03-19 10:26:06.251][dhcpad_dal.c:180][ERR]dal_db_connect fail(-26, Unknown error)

0 0x7ffff4e60dd9 (/usr/lib64/libasan.so.0+0x15dd9)

#1 0x7ffff189a26c (/usr/lib64/libkubernetes.so+0xbe26c)

0x6004003171b0 is located 0 bytes inside of 4-byte region [0x6004003171b0,0x6004003171b4) freed by thread T5 here:

0 0x7ffff4e60dd9 (/usr/lib64/libasan.so.0+0x15dd9)

#1 0x7ffff189a26c (/usr/lib64/libkubernetes.so+0xbe26c)

previously allocated by thread T5 here:

0 0x7ffff4e60ef9 (/usr/lib64/libasan.so.0+0x15ef9)

#1 0x7fffef773b89 (/usr/lib64/libc-2.17.so+0x8cb89)

Thread T6 created by T0 here:

0 0x7ffff4e55a0a (/usr/lib64/libasan.so.0+0xaa0a)

#1 0x45db8d (/pkg/bin/nrdhcpad.xcg+0x45db8d)
#2 0x434238 (/pkg/bin/nrdhcpad.xcg+0x434238)
#3 0x43ce65 (/pkg/bin/nrdhcpad.xcg+0x43ce65)
#4 0x4742e6 (/pkg/bin/nrdhcpad.xcg+0x4742e6)
#5 0x474cf2 (/pkg/bin/nrdhcpad.xcg+0x474cf2)
#6 0x431352 G(/pkg/bin/nrdhcpad.xcg+0x431352)
#7 0x4080aa (/pkg/bin/nrdhcpad.xcg+0x4080aa)
#8 0x505ce9 (/pkg/bin/nrdhcpad.xcg+0x505ce9)
#9 0x40f8e5 (/pkg/bin/nrdhcpad.xcg+0x40f8e5)
#10 0x405015 (/pkg/bin/nrdhcpad.xcg+0x405015)
#11 0x7fffef709554 (/usr/lib64/libc-2.17.so+0x22554)

Thread T5 created by T0 here:

0 0x7ffff4e55a0a (/usr/lib64/libasan.so.0+0xaa0a)

#1 0x45db8d (/pkg/bin/nrdhcpad.xcg+0x45db8d)
#2 0x434238 (/pkg/bin/nrdhcpad.xcg+0x434238)
#3 0x43ce65 (/pkg/bin/nrdhcpad.xcg+0x43ce65)
#4 0x4742e6 (/pkg/bin/nrdhcpad.xcg+0x4742e6)
#5 0x474cf2 (/pkg/bin/nrdhcpad.xcg+0x474cf2)
#6 0x431352 (/pkg/bin/nrdhcpad.xcg+0x431352)
#7 0x4080aa (/pkg/bin/nrdhcpad.xcg+0x4080aa)
#8 0x505ce9 (/pkg/bin/nrdhcpad.xcg+0x505ce9)
#9 0x40f8e5 (/pkg/bin/nrdhcpad.xcg+0x40f8e5)
#10 0x405015 (/pkg/bin/nrdhcpad.xcg+0x405015)
#11 0x7fffef709554 (/usr/lib64/libc-2.17.so+0x22554)

==2107129== ABORTING

We would like to ask you to analyze this issue. Thanks in advance.

ityuhui commented 3 years ago

Hi @clearday4

What's the step to reproduce this issue ? I think: In a multi-thread program, one thread keeps calling CoreV1API_listNamespacedPod (how often ?), what does the other thread do ?

clearday4 commented 3 years ago

2 thread call CoreV1API_listNamespacedPod() with fieldselector "status.phase=Running" simultaneously at 10ms intervals. If I add the pthread_mutex_lock back and forth of CoreV1API_listNamespacedPod()+v1_pod_list_free(), the problem is solved.

clearday4 commented 3 years ago

I think you need to change cJSON_Parse(event_string) =========> const char *parse_end = NULL; cJSON_ParseWithOpts(event_string, &parse_end, 0);

and print out parse_end instead of cJSON_GetErrorPtr()

ityuhui commented 3 years ago

I think you need to change cJSON_Parse(event_string) =========> const char *parse_end = NULL; cJSON_ParseWithOpts(event_string, &parse_end, 0);

and print out parse_end instead of cJSON_GetErrorPtr()

Where is the code ? In the example ?

brendandburns commented 3 years ago

@ityuhui It's in a bunch of the generated client library code, eg.

https://github.com/kubernetes-client/c/blob/97c945b5cc2659fc88090603faa23a533c518282/kubernetes/api/ExtensionsAPI.c#L54

I think we need to change the code generator.

ityuhui commented 3 years ago

Hi @brendandburns

Described in cJSON official readme,

However it is thread safe under the following conditions:

  • cJSON_GetErrorPtr is never used (the return_parse_end parameter of cJSON_ParseWithOpts can be used instead)

We don't need replace cJSON_Parse in generated code because the generated code does not call cJSON_GetErrorPtr

But the examples and config/authentication plugin code use the function cJSON_GetErrorPtr, they are not thread safe. e.g.

https://github.com/kubernetes-client/c/blob/master/examples/multi_thread/watch_pod.c#L17 https://github.com/kubernetes-client/c/blob/master/kubernetes/config/authn_plugin/authn_plugin_util.c#L55

I will fix them.

ityuhui commented 3 years ago

But the core dump described by this issue is not releated to the C client library itself.

Users of the C client library, if they use cJSON directly in their code, should follow the requirements in cJSON official readme https://github.com/DaveGamble/cJSON#thread-safety to ensure the thread safety in multi-threaded program.

brendandburns commented 3 years ago

@ityuhui thanks for the explanation (and sorry I didn't read into the details of our cJSON usage).

I merged #58 so closing this issue.

ityuhui commented 3 years ago

Thank you @brendandburns

ityuhui commented 3 years ago

And thank you @clearday4 for the finding and suggestion.