Reconcile cache folder names between shell script and Python code

ctrueden commented 5 years ago

The shell script version of jgo names the cache folder structure with groupId as a single folder. E.g., org.scijava:jython-shaded:2.7.1.1 will be cached at ~/.jgo/org.scijava/jython-shaded/2.7.1.1. Whereas in Python, the groupId dots are replaced with nested folders, e.g. ~/.jgo/org/scijava/jython-shaded/2.7.1.1. This means that an endpoint built with one of the implementations cannot be reused by the other implementation.

[ ] Python jgo
[ ] jgo.sh

ctrueden commented 5 years ago

Example endpoint:

org.codehaus.groovy:groovy-groovysh:org.codehaus.groovy.tools.shell.Main+commons-cli:commons-cli:1.3.1

The corresponding folder structure will be:

<cache-root>
└── org.codehaus.groovy
    └── groovy-groovysh
        ├── RELEASE+commons-cli-commons-cli-1.3.1
        │   ├── commons-cli-1.3.1.jar -> /Users/curtis/.m2/repository/commons-cli/commons-cli/1.3.1/commons-cli-1.3.1.jar
        │   ├── groovy-3.0.0-beta-1.jar -> /Users/curtis/.m2/repository/org/codehaus/groovy/groovy/3.0.0-beta-1/groovy-3.0.0-beta-1.jar
        ...
        │   ├── org.codehaus.groovy.tools.shell.Main

The org.codehaus.groovy.tools.shell.Main is a file whose contents are the same as the filename.

For cases where a short name is given with @ prefix, the file will be e.g. @Main and the fully qualified class will be the content of the file.

For cases where no main class is given, the file mainClass will be written whose contents will be the inferred FQCN.

In a nutshell: the filename will match what the endpoint string has, whereas the content will match the corresponding FQCN always.

In addition, whenever a jgo environment is rebuilt (i.e. from scratch or with -u set), we write out a stub file buildSuccess indicating success at the conclusion of the build.

ctrueden commented 5 years ago

I just ran into a nasty problem on macOS which is probably relating to this work:

OSError: [Errno 63] File name too long: '/Users/curtis/.jgo/net.imagej/imagej/2.0.0-rc-71+ch.qos.logback-logback-classic-1.2.3+io.scif-scifio-bf-compat-4.0.0+io.scif-scifio-lifesci-0.9.0+io.scif-scifio-0.38.2+net.imglib-imglib2-imglyb-0.3.0+ome-formats-api-6.3.0+ome-formats-bsd-6.3.0+ome-formats-gpl-6.3.0+org.slf4j-log4j-over-slf4j-1.7.28'

So I guess we need to concoct a directory naming scheme that guarantees the length doesn't grow too long. Maybe hashing the names of the secondary endpoints? It should be something we can also easily implement in bash, for the shell script version...

kephale commented 3 years ago

I'm just bumping this issue with directory names being too long. There isn't a proper workaround and I'm running into this a couple of times per week. Shaving deps is the only option at the moment but it prevents me from doing as much as I'd like to

hanslovsky commented 3 years ago

We could change this to just use

.jgo/groupId/artifactId

and then have random/enumerated/hashed sub-directories that contain all the jars etc and a config file that specifies additional dependencies. Whenever jgo is executed, it would then go through these sub-directories and pick the one that has the correct config. In this case it may be best to simply use hashes, as long as we can guarantee that no collisions happen.

kephale commented 3 years ago

That sounds great to me. Hashes seem to make the most sense.

scijava / jgo

Reconcile cache folder names between shell script and Python code #31