openrewrite / rewrite-maven-plugin

OpenRewrite's Maven plugin.
https://openrewrite.github.io/rewrite-maven-plugin/plugin-info.html
Apache License 2.0
142 stars 74 forks source link

Mistake in detecting encoding of pom.xml #734

Closed leveretconey closed 8 months ago

leveretconey commented 9 months ago

What version of OpenRewrite are you using?

I am using

How are you running OpenRewrite?

pom.xml of the project:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>debug</artifactId>
  <version>1.0-SNAPSHOT</version>
  <properties>
    <!-- UTF-8编码的中文-->
    <!-- ↑ Chinese characters in utf-8-->
    <project.build.sourceEncoding>gbk</project.build.sourceEncoding>
  </properties>
</project>

In a legacy application that java source files are using gbk encoding while pom.xml is using utf-8 encoding, OpenRewrite has recognized the file encoding of pom.xml as gbk. This leads to some problem in our scenerio.

After some debugging, I noticed that OpenRewrite reads project.build.sourceEncoding property from pom and sets it as the encoding of all files in this project. I think this behavior is not correct because this property is only applicable to source code files by Maven's design.

What did you expect to see?

pom.xml should be detected as using utf-8 encoding

What did you see instead?

pom.xml is detected as using gbk encoding

Are you interested in contributing a fix to OpenRewrite?

I won’t contribute to this because the code changes may be huge and involve multiple code repositories.

timtebeek commented 9 months ago

Thanks for calling this out @leveretconey ! Sounds like an oversight; adding a bit of context here already.

We indeed read project.build.sourceEncoding in MavenMojoProjectParser.listSourceFiles and use that to parse source files for the project. https://github.com/openrewrite/rewrite-maven-plugin/blob/2a2d1f625843bd871b20b6ebc2e21d386159a58b/src/main/java/org/openrewrite/maven/MavenMojoProjectParser.java#L162-L178

At the point where we read project.build.sourceEncoding we have already parsed the Maven files just before. I don't see any reference to sourceEncoding in other projects than rewrite-maven-plugin; that makes me think the pom.xml is parsed without taking sourceEncoding into account. What files are affected by the problem you're reporting? Is it the pom.xml files themselves? Files inside src/? Or files outside of src/?

If the problem you're described only affects files outside src/ then this might be a first step towards seeing that resolved:

Would you be able to give that a go and see if that helps you there?

leveretconey commented 9 months ago

Thanks for your reply @timtebeek !

Let me add how we are currently using the MavenMojoProjectParser: Currently we set skipMavenParsing=true to skip the built-in POM parsing (otherwise sometimes it hangs, and we have never figured out the reason, which might be related to the Great Firewall), then we set parseAdditionalResources=true to parse the pom.xml as a regular XML file and then implement own recipes to modify pom.xml. In addition, we have implemented some recipes to modify other configuration files that are outside of the src/ directory. Currently, it appears that these configuration files may also be affected by project.build.sourceEncoding, which could lead to incorrect encoding.

I have tested #735 and that does solve my issue.

timtebeek commented 8 months ago

Hi @leveretconey ; did you you have a chance to try out the latest snapshot already? I'm hoping this is now fixed for you, but would like to be sure.