marschall / memoryfilesystem

An in memory implementation of a JSR-203 file system
282 stars 36 forks source link

Glob compilation fails and '**/.gitignore' fails with absolute path #91

Closed glhez closed 6 years ago

glhez commented 6 years ago

After migration from JimFS to MemoryFS (I wanted to use only one memory FS), some of my test failed because of a glob pattern: {,**/}.gitignore. MemoryFS fails to parse the pattern because it first split it by the path separator and does not find the end } in {,**. In that case, an exception is thrown.

On a side note, here are how JimFS, OpenJDK and Bash works:

There is also another problem which arise if we test /a/b/.gitignore against **/.gitignore; MemoryFS (in GlobPathMatcher class) verify if the path is absolute and the glob is absolute: **/.gitignore should work as it is equivalent to .*/.gitignore regexp.

I don't remember (I've read the code last year because I've had problems with glob which worked fine on Windows but not on Linux - or the reverse - because of the path separator \ vs /) how OpenJDK handle the case of absolute path: from the depth of my memory, I think that

If I have time I'll try to work a glob2pattern implementation with some customization (eg: path separator !) - I also need it in my project as a maven independent pom.

marschall commented 6 years ago

Actually, we also transpile to a regex pattern, see https://github.com/marschall/memoryfilesystem/blob/master/src/main/java/com/github/marschall/memoryfilesystem/PathParser.java#L95

marschall commented 6 years ago

When I look at the glob specification in FileSystem#getPathMatcher I have a hard time understanding what is meant by the {,**/} group. Using , as a separator we have an empty string and **/ as a subpattern. I don't know what is meant by this.

marschall commented 6 years ago

Copying from OpenJDK is not OK from from a licence point of view. I don't know if copying from JimFS is OK from from a licence point (APL2 vs MIT) of view so I'd like to avoid this.

glhez commented 6 years ago

The {,**/} is really equivalent to (|.*/): the javadoc says it: the { } characters are a group of subpatterns, where the group matches if any subpattern in the group matches. The "," character is used to separate the subpatterns. Groups cannot be nested.

To complete the analysis, it also work under bash (tried on version bundled in Git for Windows).

$ shopt -s globstar # enable the ** path expansion (eg: make it recursive)
$ echo {,**/}pom.xml | xargs -n1 echo # print all pom.xml
pom.xml
apache/commons-lang/pom.xml
github/assertj-assertions-generator-maven-plugin/pom.xml
github/assertj-assertions-generator/pom.xml
github/assertj-core/pom.xml
github/memoryfilesystem/pom.xml
github/versions-maven-plugin/pom.xml
glhez commented 6 years ago

Actually, we also transpile to a regex pattern, see https://github.com/marschall/memoryfilesystem/blob/master/src/main/java/com/github/marschall/memoryfilesystem/PathParser.java#L95

I've seen it. Not that it is wrong (except for the error :)), but I need an implementation being configurable and independent of the FileSystem implementation.

For example, if you use True VFS, which does not create glob pattern, you need to use the default provider - which is Windows for my case. The glob pattern **/*.java transpile to .*\\[^\\]*.java.

When you use the corresponding path matcher against a TrueVFS Path (or even the default ZipFileSystem), it fails: the Path use / as name separator. Thus, Pattern.compile(.*\\[^\\]*.java, Pattern.CASE_INSENSITIVE).matcher(path.toString()).matches() fails.

glhez commented 6 years ago

After some more in depth test (I've finally had the time to do my implementation of GlobToPattern, except I need to handle the ** correctly), Bash works as I expected it to work, while Java default implementation is not so good.

With the following tree and the Bash option shopt -s globstar:

The following pattern

The Bash rational seems to only accepts ** between a path separator (like you are doing) except it translates it to a * when the use is invalid.

Also, Ant defines the ** as a special directory name.

marschall commented 6 years ago

I released 1.1.0 to Maven Central which now compiles glob to regex. I'm not 100% sure I got the semantics right but it's a start.