redhat-gpe / rhte2018_operational_intelligence

1 stars 2 forks source link

Mod Lab 2: Generate PMML #39

Open Pkrish15 opened 5 years ago

Pkrish15 commented 5 years ago

Use the toPMML() method in Spark to generate the PMML file which gets saved in a PVC.

https://mojo.redhat.com/docs/DOC-1172130

jbride commented 5 years ago

In module 4 of lab, the rules engine is being used to identify a surge price multiplier based on various conditions to include historical density of pick-ups in a traffic cluster center (aka: the traffic cluster center "ranking"). The traffic cluster center ranking (along with identification of traffic cluster centers) is identified by Spark in module 3.

At this time we have not been able to identify a use case that highlights the unique capabilities of executing PMML in the rules engine above and beyond would could already be done in Apache Spark. We'll keep brainstorming.

diego-torres commented 5 years ago

We have the option to replace the calculation made in the lab 3 by spark:

We have the option to replace that calculation for a calculation closer to the response stream (the stream of information that is enriched with pricing multipliers) using RHDM and PMML; This type of solution also could imply the use of CEP to calculate the most popular centers of traffic during a given period. Although Spark is considered to be the right tool for this type of operations. In this evaluations performed by the lab3 is where PMML support in RHDM and Spark overlap their use; we would need to justify why we think that RHDM provides a better value than Spark for the overlap. We have not found yet the response to that, hence we are using Spark as the most popular technology for that operation, and we are using RHDM as the technology that merges the human knowledge (pricing rules) to the already enriched and calculated cluster of data.

diego-torres commented 5 years ago

operational_intelligence_stage1