kit-data-manager / ro-crate-java

Java library to create and modify RO-Crates.
https://kit-data-manager.github.io/webpage/ro-crate-java/
Apache License 2.0
3 stars 3 forks source link

Review identifier encoding #171

Closed Pfeil closed 2 months ago

Pfeil commented 3 months ago

I reviewed the code as I was going through it anyway to check if #5 was completed.

Missing:

coveralls commented 3 months ago

Pull Request Test Coverage Report for Build #223

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
src/main/java/edu/kit/datamanager/ro_crate/entities/AbstractEntity.java 4 5 80.0%
src/main/java/edu/kit/datamanager/ro_crate/special/IdentifierUtils.java 16 17 94.12%
src/main/java/edu/kit/datamanager/ro_crate/entities/data/DataEntity.java 13 15 86.67%
src/main/java/edu/kit/datamanager/ro_crate/RoCrate.java 29 32 90.63%
<!-- Total: 89 96 92.71% -->
Files with Coverage Reduction New Missed Lines %
src/main/java/edu/kit/datamanager/ro_crate/RoCrate.java 1 93.6%
<!-- Total: 1 -->
Totals Coverage Status
Change from base Build #207: 0.5%
Covered Lines: 1562
Relevant Lines: 1728

💛 - Coveralls
Pfeil commented 3 months ago

We should not make our library (and the tests) dependent on the operating system. So I removed OS dependent code. The error had its cause in the Java 17 default build setup, though. More info on this is in the comments below.

Pfeil commented 3 months ago

From the log:

D:\a\ro-crate-java\ro-crate-java\src\test\java\edu\kit\datamanager\ro_crate\special\IdentifierUtilsTest.java:20: error: unmappable character (0x9D) for encoding windows-1252

    static final String FILE_CHINESE = "�?�试.mp4";

and from a windows machine:

expected: <https://example.com/break§ stuff + code<>/> but was: <https://example.com/break§ stuff + code<>/>

and

expected: <https://example.com/break%C2%A7%20stuff%20%2B%20code%3C%3E/> but was: <https://example.com/break%C3%82%C2%A7%20stuff%20%2B%20code%3C%3E/>

Note that the strings are not modified after git checkout, it happens in the build process somewhere!

Pfeil commented 2 months ago

The issue is that Java 17 on Windows will re-encode your code and therefore can cause artifacts in strings before compiling your code. Upgrading to Java 21 should help, or adding

if (JavaVersion.current() == JavaVersion.VERSION_17) {
    println "Setting encoding to UTF-8 manually"
    compileJava.options.encoding = "UTF-8"
    compileTestJava.options.encoding = "UTF-8"
}

to build.gradle. For now, I did both to support both of them.

Also, I added to the CI the following step. On manually triggered runs (not repeating an automatically triggered one!), it will upload the test results as artifacts:

    - name: Upload (test) reports as artifact on GitHub on manual runs
      if: github.event_name == 'workflow_dispatch'
      uses: actions/upload-artifact@v4
      with:
        name: test-report ${{ matrix.os }} JDK ${{ matrix.jdk }}
        path: build/reports
Pfeil commented 2 months ago

(the last "WIP" commit is being considered a stash for later continuation)

Note: the DataEntity builder now had methods which throw exceptions even though this can be handled more quietly in most cases. It is currently unknown if someone really needs it, but if so, we can keep them as extra methods, this is fine.

Still, I would prefer to keep the old behaviour of deriving the ID from the filename if it is not set already. SetID can be used afterwards if a custom name shall be given. This fits better into the builder pattern and makes sure a user does not get confused by first using setId and then overriding it with the filepath call which takes and sets the ID again. It also shrinks the line length.

I'll add some TODOs about this at the top, respectively.

Pfeil commented 2 months ago

I just saw issue #4 has also been worked on. I'd like to take a look this one before I merge, too.

Pfeil commented 2 months ago

I added some setters and changed some constructors in order to make things simpler and more flexible. Otherwise, just some cleanup. License entities now get added by default if more than an ID is provided, so the user will not forget this.