megaease / easeprobe

A simple, standalone, and lightweight tool that can do health/status checking, written in Go.
Apache License 2.0
2.18k stars 231 forks source link

replace bou.ke/monkey with bytedance/mockey #570

Closed samanhappy closed 2 months ago

samanhappy commented 2 months ago

As mentioned in #519, the bou.ke/monkey library has been archived for a long time and fails on the macos-arm64 architecture. Consequently, we need to find an alternative library.

After extensive testing, I found that gomonkey bytedance/mockey is the only library that works, although its APIs are not compatible with bou.ke/monkey. To address this, I have introduced a separate monkey module in EaseProbe that encapsulates the gomonkey bytedance/mockey library. This approach minimizes the cost of modifying test code and allows us to easily replace the library in the future without affecting our test code.

Key code changes include:

  1. Makefile Update: Changed -gcflags=-l to -gcflags=all=-l to accommodate changes for gomonkey bytedance/mockey.
  2. Test Files Update: Most test files only required changing the import path from bou.ke/monkey to github.com/megaease/easeprobe/monkey. However, in probe/ssh/ssh_test.go, where the original usage was unsupported, I modified the logic accordingly.
  3. Monkey Module: In monkey/monkey.go, I used a sync.Map to store all patches, which supports reset operations. A notable method called wait employs busy waiting to prevent test failures on macos-arm64. The exact cause of these errors is unclear to me, and this is a workaround. If anyone has a better solution, please let me know, as I am not very familiar with the ARM64 architecture.

Resolve #519

codecov-commenter commented 2 months ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 86.56716% with 9 lines in your changes missing coverage. Please review.

Project coverage is 89.29%. Comparing base (eef3a25) to head (615c2f3). Report is 65 commits behind head on main.

Files with missing lines Patch % Lines
monkey/monkey.go 86.56% 7 Missing and 2 partials :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #570 +/- ## ========================================== - Coverage 98.80% 89.29% -9.52% ========================================== Files 85 93 +8 Lines 5867 6528 +661 ========================================== + Hits 5797 5829 +32 - Misses 51 670 +619 - Partials 19 29 +10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

samanhappy commented 2 months ago

Refined: The SIGBUS error still occurs occasionally, as seen here, and I haven't found a solution yet.

suchen-sci commented 2 months ago

Refined: The SIGBUS error still occurs occasionally, as seen here, and I haven't found a solution yet.

Try to fix it, then you will be the new monkey king!

samanhappy commented 2 months ago

Bad news: It appears that the SIGBUS error is related to the Gomonkey library, as discussed in this issue.

Good news: I've discovered a new monkey patching library bytedance/mockey. After replacing gomonkey with it, the test results are looking promising so far.

samanhappy commented 2 months ago

@proditis Thanks so much for your testing efforts! The random errors are also unacceptable to me. To help us investigate further, could you please provide the following information about your test environment:

  1. What is your system version? Is it macOS ARM64?
  2. What are the exact error messages when the tests fail? Do the errors have anything in common, or are they entirely random?
proditis commented 2 months ago

My system is Linux x86_64 and go version go1.23.0 linux/amd64.

I will repeat the tests to try to figure out what is causing this failure. Will keep you posted

edit: Please find attached the output from two failed tests test1.log & test2.log

As far as i can see all failed tests are related to the TestOpenLogFail tests.

proditis commented 2 months ago

I took a closer look at the failed test. Do we need to monkey.patch this call? image

I think that with a bit of rework on this test case we may be able to get rid of this error.

samanhappy commented 2 months ago

Sorry for the delayed response. I reviewed the two error logs and the original TestOpenLogFail code. Here are some thoughts:

Both failures occur within the lumberjack library's internal functions. We need to determine whether these errors are caused solely by lumberjack or if our patch operations are contributing to the issue.

While the patch call in the image you attached is necessary, the patch call starting at line 188 (shown below) doesn't seem to have any effect on our test:

var lum *lumberjack.Logger
monkey.PatchInstanceMethod(reflect.TypeOf(lum), "Rotate", func(_ *lumberjack.Logger) error {
    return fmt.Errorf("error")
})

I couldn't reproduce the error on my MacBook. My suggestion is to remove this patch code and run the tests again on your system to determine whether the issue originates from the library or the patch. Does that sound like a good plan?

proditis commented 2 months ago

Hi thanks for the feedback, although i still dont understand why these calls need to be patched :rofl:

I commented out the following

    //var lum *lumberjack.Logger
    //  monkey.PatchInstanceMethod(reflect.TypeOf(lum), "Rotate", func(_ *lumberjack.Logger) error {
    //      return fmt.Errorf("error")
    //  })

Rerun the tests and got the following error test3.log

PS: This is how I run my tests in case it helps, i am getting a failed run every ~20-40 runs

fail=0; runs=1; while [ "$fail" -eq 0 ];do echo "[$runs] Running tests"; make test >test.log || ((fail=fail+1)); ((runs=runs+1)); done
samanhappy commented 2 months ago

Many thanks to @proditis for helping us resolve the random errors. @suchen-sci, you can now review this PR again.

suchen-sci commented 2 months ago

Sorry for the delayed response... LGTM!

samanhappy commented 1 month ago

Oops, the SIGBUS: bus error still exists as show here !

proditis commented 1 month ago

Oops, the SIGBUS: bus error still exists as show here !

I dont have a mac to run tests, but i suspect the same errors will come up with OpenBSD as well, so i'll prepare a system to run tests and see what fails. OpenBSD is much more strict with these types of violations, so hopefully we'll be able to catch most of these errors.

Could this failure be logger related also?

PS: Maybe we need to open an issue to keep track of these crashes and close it only after we have run enough tests to ensure we fixed all such failures?

samanhappy commented 1 month ago

I'm not sure if these errors are related to the logger, but all the SIGBUS: bus error occurrences seem to be specific to macOS-arm64.

I've opened an issue as per your advice. Let's track the progress there and keep each other updated on any developments.