Closed bozaro closed 2 months ago
Sorry, I did not mark on this API that it is indeed not thread safe, this is a feature to be implemented.
I tried to research this problem a little (and update testcase repo).
It looks like parallel execution breaks down in at least the following cases:
.so
file is called (it seems dlopen
loads it into the same address space and uses global variables). I can workaround this by copying the file with different name (hard link is not enough).You are right! They use the same address and global variables to share the KCL runtime, which currently does not support concurrent running. We are working hard to solve this problem, which is a bit tricky, similar to the parallelism issue of Python GIL and JVM, but it is expected that the 0.9 version KCL runtime will support concurrency.
Since version 0.8.4+, the BuildProgram
method has also became non-thread-safe.
In version 0.8.3, the BuildProgram
successfully works for us with parallelism.
Unfortunatelly, I was unable to create an example for stable reproducing BuildProgram
concurrency issue.
Thank you for your feedback. I will investigate it.
I did modify the build logic before two versions, but added locks for different build threads. 😷
Looks there are some memoty corruption between 0.8.3 and 0.8.4+.
I add mutex to every BuildProgram
, ParseFile
and ExecArtifact
call. And application still crash after 0.8.3 -> 0.8.4 upgrade.
Thank you very much for your feedback. I will carefully review the recent updates to identify factors. It is expected that version 0.8.6 will be released next week.
Looks there are some memoty corruption between 0.8.3 and 0.8.4+. I add mutex to every BuildProgram, ParseFile and ExecArtifact call. And application still crash after 0.8.3 -> 0.8.4 upgrade.
I found my crashes-after-upgrade root cause: https://github.com/kcl-lang/lib/issues/73 (I have locally applications with differ kcl version).
Hello @bozaro
https://github.com/kcl-lang/kcl-go/blob/main/pkg/env/env.go#L77
We have introduced an experimental feature gate in kcl-go v0.9.0 beta.1
You can quickly run your code by opening it, without the need to call the BuildArtifact
and ExecArtifact
combination native APIs. You can directly call the ExecProgram
API to achieve the higher performance, and it supports concurrency execution. Welcome to try it out.
I tried the configuration in KCL_FAST_EVAL
mode. It was not possible to carry out a full check, since it breaks on some structures that work without it.
I'll try to make short examples that don't work with KCL_FAST_EVAL
, but a little later.
In terms of performance, the same configuration so far produces approximately the following figures:
In general, ExecArtifact
on 0.9.0-beta.2 works about 2 times slower compared to 0.8.7. The compilation time of BuildProgram
in 0.9.0-beta.2 has also increased.
I had hopes of doing parallel execution, when ExecArtifact
is called sequentially within one .so
file, but there are pool of these .so
files. This trick works for a while, then it starts constantly giving the error already borrowed: BorrowMutError
.
I try regenrate all configuration with KCL_FAST_EVAL=1
on 3f2e611ac8fb11b7371bcbe83d273fbf997455c2 revision (current main
branch head) and got the same output as before.
But in multithread mode I also got multiple errors like:
|
4 | regex.match(val, "^((http|https)://)[-a-zA-Z0-9@:%._\\+~#?&//=]{2,256}\\.[a-z]{1,6}\\b([-a-zA-Z0-9@:%._\\+~#?&//=]*)$")
| already mutably borrowed: BorrowError
|
Inside our lambda:
import regex
checkHttpUrl = lambda val {
regex.match(val, "^((http|https)://)[-a-zA-Z0-9@:%._\\+~#?&//=]{2,256}\\.[a-z]{1,6}\\b([-a-zA-Z0-9@:%._\\+~#?&//=]*)$")
}
checkHostPort = lambda val {
regex.match(val, "^[-a-z0-9._]{2,256}:\\d+$")
}
Hello @bozaro Thanks for the feedback. Could you please give me a simple case that reproduce the multithread error and I will fix it recently and what the API is native.ExecProgram
in Go SDK?
I try remove all regex
package usages and regenerate configuration with KCL_FAST_EVAL=1
on 3f2e611ac8fb11b7371bcbe83d273fbf997455c2 revision in multithread mode. Regeneration without regex
usages passed succesfully.
I try remove all
regex
package usages and regenerate configuration withKCL_FAST_EVAL=1
on 3f2e611 revision in multithread mode. Regeneration withoutregex
usages passed succesfully.
I see. Thanks. I will try to fix it recently. I guess it's because the current runtime system library shares function addresses, maybe they should be initialized as thread_scope, and I will improve it as soon as possible.
I try to add small testcases:
v=option("foo")
https://github.com/bozaro/kcl-issue/blob/8088567e0b69128927c987e59a6fe5400e3e1ec7/kcl_concurrency_issue_test.go#L248-L256 - failed or crashed
import regex
v=option("foo")
x=regex.match("foo", "^\\w+$")
I've given a try to fix it in PR https://github.com/kcl-lang/kcl/pull/1458 and add parallel tests in Rust and it ran successfully, and Go tests also ran successfully.
I try regenerate all our configuration on https://github.com/kcl-lang/kcl/commit/9915cd47a466f42a8ce2ddb96ae20e56596c0067 revision in multithread mode:
BuildProgram
+ ExecArtifact
with lock per .so-file - worked (~4m27s wall clock time) :tada:;BuildProgram
+ ExecArtifact
without lock per .so-file - crashed;ExecProgram
(FastEval: true) - worked (~1m16s wall clock time) :tada:.From the outside, looks like that somewhere the compilation has a lock: it goes into single thread even for independent WorkDir
s.
Yes, there's a global build file lock currently to prevent common external module (e.g., kcl mod add k8s) link crash errors.
JUST FYI
BuildProgram
and ExecArtifact
API will be annotated deprecated in v0.10.0
I remove all locks in my code, upgrade to v0.10.0-alpha.1, switch to fast eval and it's works fine in multithread mode.
There is still a race in the plugin API (for example: https://github.com/kcl-lang/kcl-go/blob/e8708330af3f7e010f88dcd5f4062164f1999f6f/pkg/plugin/utils_c_string.go#L22-L36): there is no guarantee that the string will not be freed from the ring buffer before use. But this race has not affects me yet.
there is no guarantee that the string will not be freed from the ring buffer before use. But this race has not affects me yet.
I see. Thank you! Do you have any good suggestions or changes regarding this? PRs welcome!
I see. Thank you! Do you have any good suggestions or changes regarding this? PRs welcome!
I think this will require changing the Plugin API:
kclvm_service_new
, add a structure with the Plugin API version, a callback method for calling the plugin, and a method for freeing the received strings.ExecProgram
to the plugin method (something like int64) to be able to determine which ExecProgram
call the plugin call belongs to.I once looked at how to add this feature to KCL, but then I lacked a general understanding of how KCL works (how generation, compilation and runtime are related). Perhaps I will try to return to this issue again a little later.
Because this issue has not been updated for a long time, KCL Plugin can be further restructured and improved. I will close this PR and later open a new issue for tracking. If you have any questions, please feel free to reopen the issue or create a new issue or PR.
Bug Report
ExecArtifact
crash on parallel execution (serial execution works fine).1. Minimal reproduce step (Required)
Clone small test (go lang):
Test source code:
2. What did you expect to see? (Required)
All test passed
3. What did you see instead (Required)
Errors like:
Or crash like:
4. What is your KCL components version? (Required)
kcl-lang.io/kcl-go v0.8.2