openrewrite / rewrite

Automated mass refactoring of source code.
https://docs.openrewrite.org
Apache License 2.0
2.28k stars 339 forks source link

Poor path performance of JsonPathMatcher for Yaml ChangeKey and ChangeValue on large files #4650

Open jrgleason opened 3 weeks ago

jrgleason commented 3 weeks ago

What problem are you trying to solve?

I have a project with a bunch of YAML files. When I try to add a recipe that runs ChangeKey against a complex oldKeyPath it causes huge lags (in my case a dry run goes from about 6 mins to over 1 hour). To get around this I started creating a simple precondition that scans the whole file for a string (typically the key or another key that is more unique). I understand this has some pitfalls and also doesn't help is a 3rd party recipe uses it and doesn't upgrade so I was hoping we could come up with a better solution.

Describe the solution you'd like

Maybe a parameter to the recipe that allows us to pass a search string as well to run the regex before the matcher. It should be done before the matcher is run here...

https://github.com/openrewrite/rewrite/blob/420f56dd4ab3469db4855a2bf37c443b8e8af9f5/rewrite-yaml/src/main/java/org/openrewrite/yaml/ChangeKey.java#L61

Have you considered any alternatives or workarounds?

My work around, as I said, is to use a precondition to check the file for some text before running the matcher

Additional context

I found this issue through my company and I am working on trying to get approved to contribute. When this happens I should be able to provide an example. Till then I will try to recreate one on my own but in theory this should be true of any project with a lot of yaml files (maybe even just a lot of files).

Are you interested in contributing this feature to OpenRewrite?

Sure but need to finish clearance first

jrgleason commented 3 weeks ago

Problem is also true of ChangeValue

jrgleason commented 3 weeks ago

Ok more info, it wasnt the amount of files but rather 1 file at 1.6mb. I would still recommend the optional regex param as a way to mitigate it as it tears through the file way faster.

timtebeek commented 2 weeks ago

Thanks for the added details here @jrgleason ! Let us know if you're indeed cleared to contribute. Ideally we improve performance of that JsonPathMatcher.matches without any additional arguments, but I'm not sure if that has been profiled and tried before: https://github.com/openrewrite/rewrite/blob/a126fc846e3928467f4d12a7c70fc5df21208027/rewrite-yaml/src/main/java/org/openrewrite/yaml/JsonPathMatcher.java#L84-L103

timtebeek commented 2 weeks ago

For context: did your large yaml file perhaps contain anchors? I see those are expanded, so they can also add to the complexity and run duration.

knutwannheden commented 1 week ago

The support to resolve references to YAML anchors was added way back in #2272. I think this will definitely cause performance issues (for multiple reasons). I am thinking we should maybe just remove support for YAML references for now, because I get the impression that YAML references aren't correctly implemented anyway. Alternatively, we could look into fixing the performance issues it is causing.

knutwannheden commented 1 week ago

@jrgleason I've improved the performance of the JsonPathMatcher in the following commits: https://github.com/openrewrite/rewrite/commit/0f76203bede5f77c0df557f42c7bd5627f9b8c4b, https://github.com/openrewrite/rewrite/commit/f71dc0d429184ba7094715a0cf6dd51e17873e2f, https://github.com/openrewrite/rewrite/commit/ae9083973ca333639ffc4b2231822bbdf7da4e6e, https://github.com/openrewrite/rewrite/commit/0f8e755b55f2c90443e1849e28a0c62c47476d46, https://github.com/openrewrite/rewrite/commit/f1f48c08843bacfa0848bba0298314c4e323652f, and https://github.com/openrewrite/rewrite/commit/f30cd55a8de23ff9823dc1cf2a40343ac4b80edc. These changes should make the matching quite a bit faster. Are you able to check if this improves things for you? There is still more that can be done.