jpmml / jpmml-evaluator-python

PMML evaluator library for Python
GNU Affero General Public License v3.0
20 stars 9 forks source link

Atomic data exchange between Python and Java #11

Closed vruusmann closed 3 years ago

vruusmann commented 3 years ago

Current data flow:

  1. Python arguments -> Java arguments (converting a Python dict to Java map). See the JavaGateway.dict2map(dict) abstract method: https://github.com/jpmml/jpmml-evaluator-python/blob/0.4.2/jpmml_evaluator/__init__.py#L15-L16
  2. Java arguments -> Java results (evaluating a Java map to another Java map)
  3. Java results -> Python results (converting a Java map to a Python dict). See the JavaGateway.map2dict(map) abstract method: https://github.com/jpmml/jpmml-evaluator-python/blob/0.4.2/jpmml_evaluator/__init__.py#L18-L19

It appear to be the case that steps 1 and 3 are rate-limiting when dealing with larger data batches. A possible solution would be to avoid dict/map conversions in the Python layer altogether.

Refactored data flow:

  1. Python dict is passed to inner Java layer in Pickle data format.
  2. Java application unpacks Python dict pickle, performs the evaluation, and packs the results into a Python dict pickle.
  3. Python dict is passed to outer Python layer in Pickle data format.

This approach could be used for passing single data records (a single dict), or passing batches of data records (list of dicts, Pandas' data frame).

The Pickle data format can be read and written using the awesome Pickle library.

vruusmann commented 3 years ago

Right now the signature of the main Java evaluation method is this: public java.util.Map<FieldName, ?> evaluate(java.util.Map<FieldName, ?> arguments);

The API should be extended to make alternative "custom dict/map type"-oriented method signatures possible.

For example, when passing Python dicts using the Pickle data format: public net.razorvine.pickle.objects.ClassDict evaluate(net.razorvine.pickle.objects.ClassDict arguments);

The interconversion between ClassDict and Map<FieldName, ?> should happen atomically within the Java library code.