swc-project / swc

Rust-based platform for the Web
https://swc.rs
Apache License 2.0
30.93k stars 1.21k forks source link

range end index out of range for slice of length... #6467

Open siosphere opened 1 year ago

siosphere commented 1 year ago

Describe the bug

This is an intermittent bug with @swc/jest and/or @swc/core.

We are running our test suite with turbo repo, utilizing @swc/jest, and intermittently we get a failure like this:

failed to handle: range end index 7551504 out of range for slice of length 6467552
@ssi/document-landing:test: 
@ssi/document-landing:test:       at Compiler.transformSync (../../node_modules/@swc/core/index.js:241:29)
@ssi/document-landing:test:       at transformSync (../../node_modules/@swc/core/index.js:348:21)
@ssi/document-landing:test:       at Object.process (../../node_modules/@swc/jest/index.js:73:45)

Retrying the ci/cd job and it will succeed just fine.

This does not happen very often, but ~1/50 runs. That error message appears to be from rust, which is pointing me more towards it being an @swc issue, than a jest specific issue.

Input code

No response

Config

{
    "jsc": {
        "parser": {
            "syntax": "typescript",
            "tsx": true
        }
    },
    "module": {
        "type": "commonjs"
    }
}

Playground link

No response

Expected behavior

Test suites should not intermittently fail with no changes

Actual behavior

Fails intermittently

Version

1.2.128

Additional context

"@swc/cli": "^0.1.57", "@swc/core": "^1.2.128", "@swc/jest": "^0.2.15",

kwonoj commented 1 year ago

We wish to dig this down, but as you may be aware it is extremely non trivial to debug these kind of intermittent issue without having an isolated repro codes.

We'll try to look into time to time, but probably will be blocked by no repro.

siosphere commented 1 year ago

We wish to dig this down, but as you may be aware it is extremely non trivial to debug these kind of intermittent issue without having an isolated repro codes.

We'll try to look into time to time, but probably will be blocked by no repro.

Understandable. I'm wondering if there is a way to get more debug output out of swc. I could setup our CI/CD pipeline to run with more verbose output if it is available. (I'm not familiar enough with rust to go poking around and adding debug statements, but if a verbose option doesn't currently exist I may attempt that)

kwonoj commented 1 year ago

It is not easy, but if you could run this with debug build it maybe helpful. But it means

siosphere commented 1 year ago

Since it is happening daily, it is worth it for me to go down that path.

I'll take a look at the build directions for swc and get a custom debug build setup, I'll update this issue with any findings.

Thanks,

siosphere commented 1 year ago

Some additional output:

@coursehero/html-viewer:test: thread '<unnamed>' panicked at 'range end index 7551504 out of range for slice of length 1626080', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmer-engine-universal-2.3.0/src/artifact.rs:102:38
@coursehero/html-viewer:test: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
kdy1 commented 1 year ago

Can you try RUST_BACKTRACE=1?

swc-bot commented 1 year ago

This issue has been automatically closed because it received no activity for a month and had no reproduction to investigate. If you think this was closed by accident, please leave a comment. If you are running into a similar issue, please open a new issue with a reproduction. Thank you.

kwonoj commented 1 year ago

no activity for a month

🤔 this seems not correct

kwonoj commented 1 year ago

@kdy1 some ideas? Issue is opened a wee ago, and last comment was 6 days ago so there's physically no way this issue can be a month old without activity.

kdy1 commented 1 year ago

The word in comment is wrong Just remove need more info label

siosphere commented 1 year ago

Finally was able to get some more time to look at this, and got an error with backtrace available:

thread '<unnamed>' panicked at 'range end index 7551504 out of range for slice of length 6303712', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmer-engine-universal-2.3.0/src/artifact.rs:102:38
Stack backtrace:
0: <unknown>
1: <unknown>
2: <unknown>
3: <unknown>
4: <unknown>
5: _ZN6v8impl12_GLOBAL__N_123FunctionCallbackWrapper6InvokeERKN2v820FunctionCallbackInfoINS2_5ValueEEE
6: _ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10HeapObjectEEESA_NS8_INS0_20FunctionTemplateInfoEEENS8_IS4_EENS0_16BuiltinArgumentsE
7: _ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE
8: Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit

re-running the job and it passes just fine. And it is not always the same test or even the same package that errors

dgreif commented 1 year ago

Dropping in to link https://github.com/swc-project/plugins/issues/42, which I believe has a similar root cause. I was not able to get the debug info previously (thanks for doing the hard work on that @siosphere!), but I'm seeing a little more detail on my end after updating to the latest swc:

thread '<unnamed>' panicked at 'range end index 6294120 out of range for slice of length 2158560', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmer-engine-universal-2.3.0/src/artifact.rs:102:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Note the wasmer-engine-universal-2.3.0/src/artifact.rs:102:38 was not part of the original stack trace for my issue or this one, but seems to perfectly match with what @siosphere reported above. Not positive, but I'm guessing it points to this line from the wasmer 2.3.0 release? I don't see anything obviously related there, but hopefully @kdy1 or @kwonoj can spot something 🤞

kdy1 commented 1 year ago

https://github.com/wasmerio/wasmer/blob/2.3.0/lib/engine-universal/src/artifact.rs#L102 is the correct link

kdy1 commented 1 year ago

And yes, this happens if metadata_len is too large

dgreif commented 1 year ago

@kdy1 thanks for verifying the source. Is this something the SWC team will be able to look into, or should we upstream the issue to wasmer?

kdy1 commented 1 year ago

It's not a something we can look into

HashemKhalifa commented 1 year ago

is there any solution we can do for this issue? I'm facing the same problem

thread '<unnamed>' panicked at 'range end index 8811528 out of range for slice of length 7540704', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmer-engine-universal-2.3.0/src/artifact.rs:102:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace 

always with jest, not sure why!

  ● Test suite failed to run
    failed to handle: range end index 8811528 out of range for slice of length 7540704
dgreif commented 1 year ago

I had originally been under the impression this was only happening on some specific docker images (much older images), and that the errors weren't present on ubuntu:latest in GitHub Actions. We recently bumped our actions runners to much larger instances with 16 cores, and are starting to see these errors in that environment as well. From this, it seems that running many concurrent instances of swc + jest in parallel leads to this issue.

HashemKhalifa commented 1 year ago

I can relate to that. I tried using the GitHub Actions runner and everything went smoothly. However, when using our self-hosted runner in parallel mode, we encountered some issues. To mitigate the problem, I tried running it with the --runInBand option, and while the issue still occurred, it became less frequent.

ezpuzz commented 1 year ago

Also getting this issue intermittently, sometimes resolved by deleting .swc directory to re-download the loadable plugin

samar-1601 commented 3 weeks ago

Has anyone got any resolution for this issue? @ezpuzz @dgreif @HashemKhalifa

@kdy1 Do we have any updates/workaround on this. I have a very large monorepo which fails the CI pipes intermittently, giving the following error: image

These are the current versions we are using

"@swc/core": "^1.3.46",
"@swc/jest": "^0.2.24",
HashemKhalifa commented 3 weeks ago

@samar-1601 which jest version are you using and react-testing-library? I

It was a memory leak that happened while using an older version of jest and react-testing-library that you had to use a different way of handling the events.

HashemKhalifa commented 3 weeks ago

here is what I updated and adjust my tests to match the new updates.

    "@swc-node/register": "1.8.0",
    "@swc/cli": "0.1.62",
    "@swc/core": "1.3.62",
    "@swc/helpers": "0.5.1",
    "@swc/jest": "0.2.26",
    "@swc/plugin-jest": "^1.5.67",
    "@swc/plugin-loadable-components": "0.3.67",
    "@swc/plugin-styled-components": "1.5.67",
    "jest": "^29.7.0",
    "jest-axe": "^8.0.0",
    "jest-canvas-mock": "^2.5.2",
    "jest-each": "^29.7.0",
    "jest-environment-jsdom": "^29.7.0",
    "jest-localstorage-mock": "^2.4.26",
    "jest-styled-components": "^7.1.1",
    "jest-transform-stub": "^2.0.0",
    "jest-watch-typeahead": "^2.2.2",
    "jest_workaround": "^0.76.0",
    "jsdom": "^22.1.0",
    "jsonc-eslint-parser": "^2.1.0",
      "@testing-library/dom": "^9.3.3",
    "@testing-library/jest-dom": "^6.1.4",
    "@testing-library/react": "^14.0.0",
    "@testing-library/user-event": "^14.4.3",
samar-1601 commented 3 weeks ago

@HashemKhalifa We are using the following versions.

"jest": "26.6.3",
"jest_workaround": "^0.72.6",

It was a memory leak that happened while using an older version of jest and react-testing-library that you had to use a different way of handling the events.

Is this related in any way to these node+jest version issues.

We have tried using various combinations of maxWorkers=(some percentage), -w=(some number of threads). But none of the memory management steps solves the problem in a concrete manner i.e. this range end index error keeps on coming intermittently.

I have found possible solutions(if the problem is related to jest+node version issues) which includes upgrading node to 21.1 and using workerIdleMemoryLimit(mentioned in the official jest documentation).

The problem is I am stuck in a 2 way here, cause the solutions

Both will take a huge effort considering the size of our monorepo and currently, we have our pipes failing intermittently on a daily basis.

HashemKhalifa commented 3 weeks ago

I feel you, been in your shoes, and that's correct with the memory leak issues you mentioned, I had to upgrade both, since then all good, and currently planning to move away from jest to vitest as well.

samar-1601 commented 3 weeks ago

@HashemKhalifa Did you use the workerIdleMemoryLimit feature to restrict the memory as well? Cause in our case, upgrading node to 21.1 as mentioned in https://github.com/jestjs/jest/issues/11956 didn't solve our problem, instead it became a recurring issue. Hoping the jest upgrade to 29.7 and workerIdleMemoryLimit solves our problem.

Will update the thread with our findings.

HashemKhalifa commented 3 weeks ago

I have tried workerIdleMemoryLimit, but it didn't work as I expected, which react version are you using?

samar-1601 commented 3 weeks ago

@HashemKhalifa React version is 18.2.0 So if workerIdleMemoryLimit didn't work, correct me if I'm wrong, just upgrading node and jest solved the issue?

HashemKhalifa commented 3 weeks ago

@samar-1601 no, was one of the reasons that caused memory leaks, as well in our test cases there were so many tests that I had to adjust to fix the leaks, some of which had a relation with react-testing-library packages. like @testing-library/jest-dom and @testing-library/user-event

I hope that answers your question

kdy1 commented 3 weeks ago

Removing the milestone as there's no repro.

samar-1601 commented 3 weeks ago

@HashemKhalifa Another observation, using flags no-compilation-cache and expose-gc reduce the memory consumption of the jest test cases. Here is the result of a sample run over one of our modules(jest v26.4):


- node v16.10
    - Without `no-compilation-cache`, no `expose-gc` → 900+ MB
    - With the above, 580 MB (259 sec)
- node v18.2 (clearly has leaks)
    - Without `no-compilation-cache`, no `expose-gc` → 1300+ MB
    - With the above → 3000+ MB (then I manually stopped it)
- node v21.1.0
    - Without `no-compilation-cache`, no `expose-gc` → 1200+ MB (150 sec)
    - With the above →  around 220-230 MB avg (163 sec)
issacgerges commented 3 weeks ago

We too see this consistently. Are there any other diagnostic steps you can recommend? We're running these builds on C7s (xl I believe) so it's hard to believe the issue is related to memory exhaustion. Any info you can give about the error that might help us understand the issue? Is this reading compiled code from a cache? Or reading the compilation response?

It's a long shot but considering the worker model of jest is it possible that multiple workers are requesting the same file/module to be compiled at the same time?

samar-1601 commented 3 weeks ago

Yeah, this may be related to any issue when jest runs with multiple workers cause the error doesn't come when we use --runInBand(--runInBand is synonymous to -w=1 i.e 1 worker thread). But the problem is that we can't go ahead with one worker only because of the time taken for the tests to execute. For instance, a process which takes 8-9s to run with maxWorkers=50%, 22s for -w=4, takes 98-100s with --runInband.

@dgreif @kdy1 any other insights?

dgreif commented 3 weeks ago

--runInBand is still the only solution we have found which consistently avoids this error. We use nx for build parallelization and caching in our monorepo, which allows us to get most of the benefits provided by multiple Jest workers, but we do have a few big packages which still run slowly with --runInBand and delay the overall test run from finishing. I wish I had more insight for y'all, but that's all I've got 😅

samar-1601 commented 3 weeks ago

@dgreif Can you let us know the node, jest versions you are using. We have upgraded to v21.1.0 but unlike mentioned in https://github.com/jestjs/jest/issues/11956, this didn't solve our problem. So, we are now going down the line of upgrading jest to v29.7 and fix tests so as to use the --maxIdleWorkerMemoryLimit. Not sure weather this will solve the problem or not :/

samar-1601 commented 2 weeks ago

@HashemKhalifa can you guide a bit as to how you detected leaks and then fixed them. For us, currently using --no-compilation-cache seems to fix the memory pile up issue. Here is a sample for the same:

Flag Without With
--no-compilation-cache image image
HashemKhalifa commented 2 weeks ago

I can share my findings while working on this issue last year maybe it could help:

Updating Node reduced the leaks but didn't solve the problem completely https://github.com/jestjs/jest/issues/11956 https://issues.chromium.org/issues/42202158

Updating Jest still didn't solve the problem entirely but reduced how often it happened.

I'm not sure if it's related to SWC, but it definitely exacerbates the issue in Jest, I was hoping --workerIdleMemoryLimit would solve the issue but it didn't have any effect.

As @dgreif mentioned, there's only one option, which is runInBand, because Jest instances with memory leaks are not able to shut down and keep running forever until SWC complains.

Our solution:

Run tests in-band (sequentially) within each Jest instance. Keep parallelization at a higher level using NX, running multiple Jest instances concurrently. This approach isolates each test instance, preventing memory leaks from spreading, while still leveraging parallel execution for improved overall performance.