jpmml / jpmml-evaluator-hive

PMML evaluator library for the Apache Hive data warehouse software (http://hive.apache.org/)
GNU Affero General Public License v3.0
6 stars 2 forks source link

Saving the UDF JAR file to a file in HDFS filesystem #1

Open youzp opened 6 years ago

youzp commented 6 years ago

SELECT BuildArchive('com.mycompany.DecisionTreeIris', '/path/to/DecisionTreeIris.pmml', '/path/to/DecisionTreeIris.jar');

  1. Can the path of PMML file or udf-Jar be set HDFS path? If not ,will they work?
vruusmann commented 6 years ago

Can the path of PMML file or udf-Jar be set HDFS path?

The UDF JAR is generated in-memory, and then saved to a file in local filesystem: https://github.com/jpmml/jpmml-evaluator-hive/blob/master/src/main/java/org/jpmml/evaluator/hive/ArchiveBuilderUDF.java#L42

The following two options should be both technically feasible:

  1. Instead of saving the in-memory UDF JAR to a file in local filesystem, save it to a file in HDFS filesystem right from the beginning.
  2. Move the local file to HDFS file.

The first option seems more elegant. Perhaps, the third argument of the CodeModelUtil#build(String, File, File) method should simply be a java.io.OutputStream (and the second argument a java.io.InputStream): https://github.com/jpmml/jpmml-evaluator-hive/blob/master/src/main/java/org/jpmml/evaluator/hive/CodeModelUtil.java#L47

I don't have access to a proper HDFS backend at the moment. It would be appreciated, if you could investigate and report back your findings.

youzp commented 6 years ago

Thanks for your help.The second option works fine, I think it's enough to use.

vruusmann commented 6 years ago

The second option works fine, I think it's enough to use.

Reopening this issue - the current assumption that both the incoming PMML file and the outgoing UDF JAR file must reside in the local filesystem is unnecessarily restrictive.