kite-sdk / kite

Kite SDK
http://kitesdk.org/docs/current/
Apache License 2.0
394 stars 265 forks source link

kite-dataset fails on Mac OS X due to case insensitive filesystem while unpacking the JAR #475

Open ecerulm opened 6 years ago

ecerulm commented 6 years ago

The kite-tools-1.1.0-binary.jar will fail in Mac OS X since the HFS+ filesystem is case-insensitive and the jar contains META-INF/LICENSE and META-INF/license. The HFS+ by default doesn't not allow two filenames that only differ in case, it's case preserving but case insensitive.

You can verify that the JAR indeed contains a license and LICENSE with the command jar tvf kite-tools-1.1.0-binary.jar |grep -i license

This filename clash / conflict renders it unusable since when Hadoop tries to unpack the JAR will throw and IOException: Mkdirs failed to create <tmpdir>.../hadoop-unjar/.../META-INF/license:

kite-dataset csv-schema movies.csv --record-name Movie                                                                                                                     
/Users/ecerulm/bin/kite-dataset debug: Using HADOOP_COMMON_HOME=/Users/ecerulm/.local/stow/hadoop-2.8.1/
/Users/ecerulm/bin/kite-dataset debug: Using HADOOP_MAPRED_HOME=/Users/ecerulm/.local/stow/hadoop-2.8.1//../hadoop-mapreduce
/Users/ecerulm/bin/kite-dataset debug: Using HBASE_HOME=/Users/ecerulm/.local/stow/hadoop-2.8.1//../hbase
/Users/ecerulm/bin/kite-dataset debug: Using HIVE_HOME=/Users/ecerulm/.local/stow/hadoop-2.8.1//../hive
/Users/ecerulm/bin/kite-dataset debug: Using HIVE_CONF_DIR=/Users/ecerulm/.local/stow/hadoop-2.8.1//../hive/conf
/Users/ecerulm/bin/kite-dataset debug: Using HADOOP_CLASSPATH=/Users/ecerulm/bin/kite-dataset::
Exception in thread "main" java.io.IOException: Mkdirs failed to create /var/folders/j5/8yjty44917v3_ydfjyy0gz0c0000gn/T/hadoop-unjar7609709732056315890/META-INF/license
    at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:140)
    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:109)
    at org.apache.hadoop.util.RunJar.unJar(RunJar.java:85)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:222)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

Is it possible to change the JAR build process to rename the META-INF/license dir to META-INF/licenses. Googling around I found the Maven [ApacheLicenseResourceTransformer])(https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ApacheLicenseResourceTransformer) may solve the problem.

Alternatively, maybe move or rename the META-INF/LICENSE (Jackson JSON processor license).

Is this possible?, otherwise kite-dataset cannot be used (as far as I understand) on Mac OS X.

ecerulm commented 6 years ago

As a workaround that may interest people having the same problem it is possible to remove the META-INF\LICENSE file from the kite-dataset with the following commands:

curl -O  http://central.maven.org/maven2/org/kitesdk/kite-tools/1.1.0/kite-tools-1.1.0-binary.jar
md5 kite-tools-1.1.0-binary.jar # MD5 (kite-tools-1.1.0-binary.jar) = 3327af98b339725070962f7391187fc2
dd if=kite-tools-1.1.0-binary.jar bs=4114 count=1 > script.sh # first 4114 bytes of .jar to script.sh file
dd if=kite-tools-1.1.0-binary.jar bs=4114 skip=1 > jarcontent.zip # rest of jar goes to jarcontent.zip
zip -d jarcontent.zip META-INF/LICENSE
cat script.sh jarcontent.zip >~/bin/kite-dataset

that will generate a ~/bin/kite-dataset with no case conflicting filenames.