Closed vruusmann closed 5 years ago
Short form: Can we please parametrize the max number of input fields, so users can change it ?
Long form: We have a legacy model, whose only remaining representation is in PMML format. Due to the input fields number limit of 1000, there is no way to score this legacy model. If parametrization is too time taking, I would suggest a more lenient upper limit (e.g. 5000)
The limit was introduced here: 060082c8a2af855e4ebaae873bbd5ea63cb6d3a2
@a1singh See https://stackoverflow.com/questions/54952581/how-to-avoid-the-max-amount-of-input-fields-for-jpmml
If anything, then this limit should be lower (around 100) in order to promote intelligent model design.
@vruusmann Thanks for sharing. Agreed on the conceptual need for fewer features. Smallest feature set that gives good performance is what we should strive for.
The problem is legacy models already in production, especially those developed by other teams. Due to this hard coded limit, people might have to redo weeks, if not months, worth of work, for a model that was deployed by someone else.
Can we introduce a warning instead, "You are using a lot of features. Don't complain if the model evaluation is slow. Consider reducing the number of features."
Can we introduce a warning instead
Warnings don't work. It takes a big fat exception to drive the message home.
Anyway, this schema check is a ModelEvaluatorBuilder
-level functionality. The returned ModelEvaluator
instance doesn't implement any limits.
Thanks Villu for the SO ticket. We were able to override the method, and make JPMML backwards compatible to legacy models.
People are working with models that specify tens to hundreds of THOUSANDS input fields:
For example: http://stats.stackexchange.com/questions/152891/bad-performance-of-pmml-evaluator and http://stackoverflow.com/questions/42074491/evaluate-method-takes-long-time-pmml-models-using-jpmml
Understandably, such "structurally valid but conceptually/functionally invalid" models cannot be made to perform, not by the JPMML-Evaluator library, or any other PMML scoring engine.
By default, the JPMML-Evaluator library should simply refuse to deal with them:
However, the limit should be programmatically customizable. If people want to do stupid things, then they should have technical means to do so.