russellmcdonell / pyDMNrules

An implementation of DMN (Decision Model Notation) in Python
Other
40 stars 9 forks source link
dmn dmn-engine dmn-evaluator dmn-examples rules-engine

pyDMNrules

An implementation of DMN (Decision Model Notation) in Python, using the pySFeel, openpyxl and pandas modules.

DMN rules are read from an Excel workbook. Then data, matching the input variables in the DMN rules is passed to the decide() function. The returned data contains the decision.

The passed Excel workbook must contain a 'Decision' tab and may contain a 'Glossary' tab. Other tabs contain as many DMN rules tables as necessary.

Glossary - optional

The 'Glossary' tab, if it exists, must contain a table headed 'Glossary'. A 'Glossary' table lets the user add additional documentation/clarification to the input and output data. Data items can be grouped in to Business Concepts to better document where inputs come from and where outputs will be going to. The 'Glossary' table must have exactly three columns with the headings 'Variable', 'Business Concept' and 'Attribute'. The data in these three columns describe the inputs and outputs associated with the 'Decision'.

'Variable' can be any text and is used to pass data to and from pyDMNrules.
NOTE: the name 'Execute' is a reserved name and cannot be used as a Variable name. The name 'Execute' is reserved as a heading for output columns, output rows or the output variable of a Cross Tabs decision table. The output value(s) in an 'Execute' output column, 'Execute' output row or 'Execute' Cross Tabs decision table must be the name of a Decision Table which will be invoked when the matching rule is triggered. If the output value in an 'Execute' output column, an 'Execute' output or an output cell in an 'Execute' Cross Tabs decision table is missing (and empty cell), then no Decision Table will be invoked whe the matching rule is triggered.

'Business Concept' and 'Attribute' must be valid FEEL names as the 'Variable' will be mapped to a Glossary item created by joining these two FEEL names with a period character, as in 'Order.orderSize'.
Vertically merged cells in the 'Business Concept' column are used to group multiple 'Variable'/'Attribute' pairs together into a single 'Businss Concept'.
The 'Glossary' can have multiple columns of Annotations to describe things like data types, valid values etc.

Items in the Glossary can be outputs of one decision table and inputs in the next decision table, which makes it easy to support complex business models.

Input decision cells and Output cells can refer to input and output values either by their 'Variable' name, or by their Glossary item (FEEL) name, of 'BusinessConcept.Attribute'

Glossary
Variable Business Concept Attribute
Customer Customer sector
OrderSize Order orderSize
Delivery delivery
Discount Discount discount

If a 'Glossary' tab is not present - if no Glossary is supplied, then pyDMNrules will construct one using the input and output headings. There will be a single 'Business Concept' called 'Data'. Attributes will be created by replacing any space characters (' ') with an underscore character ('_') and then removing all invalid FEEL name characters. This can cause conflicts. For instance ':' is a valid character in a Variable, but not in a FEEL name. If there are two Varibles, being 'C' and 'C:' then both will be transformed to an Attribute of 'C' and a Glossary item of 'Data.C', which will be ambiguous. This ambiguity can be avoided by supplying a 'Glossary' and explicitely mapping 'C:' to a valid FEEL name, such as 'Ccolon'. The function getGlossary() will return the Glossary (either supplied or constructed).

Decision - optional

In order to make a decision, pyDMNrules needs to know where to start, what to do, when and why. The 'Decision' tab, if it exists, will contain a table headed 'Decision' which will provide the required details. The heading for the 'Desion' table will be none or more Variable names, a heading of 'Decisions', a heading of 'Execute Decision Table' and none or more headings for annotations. The 'Decisions' column is documentation and contain a brief description of the DMN rules table which is named in 'Execute Decision Table' column which is in effect just a list of DMN rules tables to be run in the specified order. If there are any 'Variable' columns then these columns must contain valid DMN test as if they were DMN rules tables input columns. pyDMNrules now knows where to start (with the first DMN rules table), what to do (each DMN rules table in turn), when (when all the 'Variable' tests are 'true') and why (because all the 'Variable' tests are 'true')

Decision
Variable(s) Decisions Execute Decision Table Annotation(s)
input test Determine Discount Discount Compute discount

The 'Variable' cells are evaluated on the fly, which means that a preceeding DMN rules table can set or clear a 'Variable' which can effect which DMN rules tables are included, and which DMN rules tables are excluded, in the final decision.

The Decision tab, and associated 'Decision' table are optional because in many instances, what to do is obvious. If the workbook does not contain a 'Decision' tab, then pyDMNrules will construct a 'Decision' table. If there is only one DMN rules table pyDMNrules will create a 'Decision' table with just one row. There can be several DMN rules tables that are independant and hence the order doesn't matter. If there's more than one DMN rules table, pyDMNrules looks for dependancies and works out an acceptable order. To do this, pyDMNrules looks at the inputs of each DMN rules table and check to see if that input is also an outputs from another DMN rules table. If DMN rules table 'A' has an input 'X' and DMN rules table 'B' has an output of 'X', then pyDMNrules will ensure that DMN rules table 'B' is is placed in the constructed 'Decision' table before DMN rules table 'A'. The default order for the remaining table will be the order in which pyDMNrules found them in the workbook. The function getDecision() will return the 'Decision' table (either supplied or constructed).

Decision - recommended

A 'Decision' table tells pyDMNrules exacty what to do. Only the DMN rules tables named in the 'Decision' table will be run (plus any DMN rules tables 'Execute'd as a result of running the named DMN rules tables). Without a 'Decision' table, pyDMNrules will run everything, even old, obsolete or test DMN rules tables that you left in the workbook by mistake. The exact DMN rules tables which were run, and the order in which they were run is returned be the 'decide()' and 'decidePandas()' functions.

It may not matter the order in which the DMN rules tables are run. None the less, it may make you decision logic easier for others to understand if they can imagine them being executed in a specific order.

Decision - required

In certain circumstances a 'Decision' table may be required, because pyDMNrules is unable to determine the correct decision order.

All of these examples require a user created 'Decision' table in order to ensure the desired decision logic. There may be other scenario requiring a user created 'Decision' table. Hence the recommendation to provide one for anything other than the most trivial decisions.

DMN rules tables

The DMN rules table(s), listed under Execute Decision Table, in the 'Decision' table (either supplied or constructed) must exist on other spreadsheets in the Excel workbook and can be 'Decisions as Rows', 'Decisions as Columns' or 'Decision as Crosstab' rules tables.

Decision as Rows rules tables

Decisions as Rows DMN rules tables have columns of inputs and columns of outputs, with a double line vertical border delineating where input columns stop and output columns begin.

ExampleRows.xlsx is an example fo a Decision as Rows DMN table.

Decision as Columns rules tables

Decisions as Columns DMN rules tables have rows of inputs and rows of outputs, with a double line horizontal border delineating where input rows stop and output rows begin.

ExampleColumns.xlsx is an example fo a Decision as Columns DMN table.

Decision as Crosstab rules tables

Decision as Crosstab DMN rules tables have one output heading at the top left of the table.

ExampleCrosstab.xlsx is an example fo a Decision as Crosstab DMN table.

The function getSheets() returns all the DMN rules tables. Each DMN rules table is returned as an XHTML rendering, with all Variables converted to their FEEL name equivalents (BusinessConcept.Attribute).

Strings:

Decision Model Notation specifies that strings must be enclosed in double quotes ("), as in "this is a string", or be italicized. pyDMNrules does not look for italicized text, but it does try to interpret unequivocal strings as string. So, HIGH,DECLINE are both strings; they are not numbers, nor dates, nor time periods.

pyDMNrules also recognizes entries from the Glossary, so Patient.age is not a string if 'Patient' is a 'Business Concept' in the Glossary and 'age' is an attribute associated with the 'Patient' Business Concept in the Glossary. When any expression containing Patient.age is evaluated, Patient.age will be replaced with the current value associated with that Glossary entry.

However, strings that may be ambiguous, must be enclosed in double quotes ("). Any string that contains spaces or a comma is ambiguous and must enclosed in double quotes ("). All strings can be enclosed in double quotes, so HIGH,DECLINE and "HIGH","DECLINE" are equivalent.

Ambigous strings:

If you don't enclose codes in double quotes the pyDMNrules will try and interpret them. So codes which are all digits will be interpreted as numbers. Codes that start with a digit, such as 9A42, will cause FEEL errors and must be enclosed in double quotes. FEEL has a syntax for durations. Any code that looks like a duration, such as P42D (42 days), will need to be enclosed in double quotes.

USAGE:

import pyDMNrules
dmnRules = pyDMNrules.DMN()
status = dmnRules.load('ExampleHPV.xlsx')
if 'errors' in status:
    print('ExampleHPV.xlsx has errors', status['errors'])
    sys.exit(0)
else:
    print('ExampleHPV.xlsx loaded')

data = {}
data['Participant Age'] = 36
data['In Test of Cure'] = True
data['Hysterectomy Flag'] = False
data['Cancer Flag'] = False
data['HPV-V'] = 'V0'
data['Current Participant Risk Category'] = 'low'
print('Testing',repr(data))
(status, newData) = dmnRules.decide(data)
print('Decision',repr(newData))
if 'errors' in status:
    print('With errors', status['errors'])

For a Single Hit Policy DMN rules table, newData will be a decision dictionary of the decision.
For a Multi Hit Policy DMN rules tables newData will be a list of decison dictionaries; one for each matched rule.
For a compound decision (a Decision table with multiple rows of DMN rules tables, where more than one Decision table is run) newData will be a list of decison dictionaries; one for each DMN rules table that is run. However, if only one of the DMN rules tables is run, and it is a Single Hit Policy DMN rules table, then newData will be a simple decision dictionary of the decision.
The keys to each decision dictionary are

If the Decision table contains multiple rows (multiple DMN rules tables run sequentially in order to make the decision) then the returned 'newData' will be a list of decision dictionaries, with each containing the keys 'Result', 'Excuted Rule', 'DecisionAnnotations'(optional) and 'RuleAnnotations'(optional), being one list entry for each DMN rules table used from the Decision table. The final enty in this list is the final decision. All other entries are the intermediate states involved in making the final decision.

$ python3 HPV.py
ExampleHPV.xlsx loaded
Testing {'Participant Age': 36, 'In Test of Cure': True, 'Hysterectomy Flag': False, 'Cancer Flag': False, 'HPV-V': 'V0', 'Current Participant Risk Category': 'low'}
Decision(newData) {'Result': {'Immune Deficient Flag': None, 'Hysterectomy Flag': False, 'Cancer Flag': False, 'In Test of Cure': True, 'Participant Age': 36.0, 'Current Participant Risk Category': 'low', 'HPV-V': 'V0', 'Cyto-S': None, 'Cyto-E': None, 'Cyto-O': None, 'Collection Method': None, 'Test Risk Code': 'L', 'New Participant Risk Category': 'low', 'Participant Care Pathway': 'toBeDetermined', 'Next Rule': 'CervicalRisk2'}, 'Executed Rule': ('Determine CervicalRisk', 'FirstTestOfCervicalRisk', '20'), 'DecisionAnnotations': [('Decides', 'Test risk')], 'RuleAnnotations': [('Test meaning', 'need more info')]}

Pandas functionality

pyDMNrules can pass a Pandas DataFrame of value through the decide() function and return a Pandas DataFrame of decisions. The status of each decision is returned as a Pandas Series and a Pandas DataFrame of information about each decision is also returned.

(dfStatus, dfResults, dfDecision) = dmnRules.decidePandas(dfInput)

dfInput is a Pandas DataFrame of rows of data about which decisions need to be made. Column names are mapped to decision Variables.

dfResults is a Pandas DataFrame with one row for each decision. Column names are mapped from all the Variables in the Glossary.

dfStatus is a Pandas Series of one value each decision. Values are the error message, or 'no errors' of there were no errors.

dfDecision is a Pandas DataFrame with columns of 'DecisionName', 'TableName', 'RuleId', 'DecisionAnnotation' and 'RuleAnnotation'

USAGE:

import pyDMNrules
import pandas as pd
import sys
dmnRules = pyDMNrules.DMN()
status = dmnRules.load('AN-SNAP rules (DMN).xlsx')
if 'errors' in status:
  print('AN-SNAP rules (DMN).xlsx has errors', status['errors'])
  sys.exit(0)
else:
    print('AN-SNAP rules (DMN).xlsx loaded')
dataTypes = {'Patient_Age':int, 'Episode_Length_of_stay':int,     'Phase_Length_of_stay':int,
    'Phase_FIM_Motor_Score':int, 'Phase_FIM_Cognition_Score':int, 'Phase_RUG_ADL_Score':int,
    'Phase_w01':float, 'Phase_NWAU21':float,
    'Indigenous_Status':str, 'Care_Type':str, 'Funding_Source':str, 'Phase_Impairment_Code':str,
    'AROC_code':str, 'Phase_AN_SNAP_V4_0_code':str,
    'Same_day_addmitted_care':bool, 'Delerium_or_Dimentia':bool, 'RadioTherapy_Flag':bool, 'Dialysis_Flag':bool}
dates = ['BirthDate', 'Admission_Date', 'Discharge_Date', 'Phase_Start_Date', 'Phase_End_Date']
dfInput = pd.read_csv('subAcuteExtract.csv', dtype=dataTypes, parse_dates=dates)
dfInput['Multidisciplinary'] = True
dfInput['Admitted_Flag'] = True
dfInput['Length_of_Stay'] = dfInput['Phase_Length_of_Stay']
dfInput['Long_term_care'] = False
dfInput.loc[dfInput['Length_of_Stay'] > 92, 'Long_term_care'] = True
dfInput['Same_day_admitted_care'] = False
dfInput['GEM_clinic'] = None
dfInput['Patient_Age_Type'] = None
dfInput['First_Phase'] = False
grouped = dfInput.groupby(['Patient_UR', 'Admission_Date'])
for index in grouped.head(1).index:
    if dfInput.loc[index]['Phase_Type'] == 'Unstable':
     dfInput.loc[index, 'First_Phase'] = True
columns = {'Admitted_Flag':'Admitted Flag', 'Care_Type':'Care Type', 'Length_of_Stay':'Length of Stay', 'Long_term_care':'Long term care',
    'Same_day_admitted_care':'Same-day admitted care', 'GEM_clinic':'GEM clinic', 'Patient_Age':'Patient Age', 'Patient_Age_Type':'Patient Age Type',
    'AROC_code':'AROC code', 'Delirium_of_Dimentia':'Delirium or Dimentia', 'Phase_Type':'Phase Type', 'First_Phase':'First Phase', 'Phase_FIM_Motor_Score':'FIM Motor score',
    'Phase_FIM_Cognition_Score':'FIM Cognition score', 'Phase_RUG_ADL_Score':'RUG-ADL', 'Delirium_or_Dimentia':'Delirium or Dimentia',
    'Problem_Severity_Scrore':'Problem Severity Score'}
(dfStatus, dfResults, dfDecision) = dmnRules.decidePandas(dfInput, headings=columns)
if dfStatus.where(dfStatus != 'no errors').count() > 0:
    print('has errors', dfStatus.loc['status' != 'no errors'])
    sys.exit(0)
for index in dfResults.index:
    print(index, dfResults.loc[index, 'AN_SNAP_V4_code'])

Testing

pyDMNrules reserves the spreadsheet name 'Test' which can be used for storing test data. The function test() reads the test data, passes it through the decide() function and assembles the results which are returned to the caller.

Unit Test data table(s)

The 'Test' spreadsheet must contain Unit Test data tables.

DMNrulesTest - the test that will be run

pyDMNrules also searches the 'Test' worksheet for a special table named 'DMNrulesTests' which must exist.

The test() function will process each row in the 'DMNrulesTests' table, getting data from the Unit Test data tables and building a data{} dictionary. The test() function then passes this data{} dictionary to the decide() function. The returned newData{} is then compared to the output data for the matching test in the 'DMNrulesTests' table. The data{}, newData{} and a list of mismatches, if any, is then appended to the list of results, which is eventually passed back to the caller of the test() function.

The returned list is a list of dictionaries, one for each test in the 'DMNrulesTests' table. The keys to this dictionary are

Note

If an output value should be the value of an input, then you can use the 'Variable' from the Glossary. However, if the output is a manuipulation of an input, then you will need to use the internal name for that variable, being the 'Business Concept' and the 'Attibute' concateneted with the period character. That is, 'Patient Age' is valid, but 'Patient Age + 5' is not. Instead you will need to use the syntax 'Patient.age + 5'

USAGE:

import pyDMNrules
dmnRules = pyDMNrules.DMN()
status = dmnRules.load('Therapy.xlsx')
if 'errors' in status:
    print('Therapy.xlsx has errors', status['errors'])
    sys.exit(0)
else:
    print('Therapy.xlsx loaded')
(testStatus, results) = dmnRules.test()
for test in range(len(results)):
    if 'Mismatches' not in results[test]:
        print('Test ID', results[test]['Test ID'], 'passed')
    else:
        print('Test ID', results[test]['Test ID'], 'failed')

$ python3 Therapy.py
Test ID 1 passed
Test ID 2 passed
Test ID 3 passed
Decisions(results) [{'Test ID': 1, 'data': {'Encounter Diagnosis': 'Acute Sinusitis', 'Patient Age': 58, 'Patient Allergies': ['Penicillin', 'Streptomycin'], 'Patient Creatinine Level': 2, 'Patient Creatinine Clearance': 44.42, 'Patient Weight': 78, 'Patient Active Medication': 'Coumadin'}, 'newData': {'Result': {'Encounter Diagnosis': 'Acute Sinusitis', 'Recommended Medication': 'Levofloxacin', 'Recommended Dose': '250mg every 24 hours for 14 days', 'Warning': 'Coumadin and Levofloxacin can result in reduced effectiveness of Coumadin.', 'Error Message': None, 'Patient Age': 58.0, 'Patient Weight': 78.0, 'Patient Allergies': ['Penicillin', 'Streptomycin'], 'Patient Creatinine Level': 2.0, 'Patient Creatinine Clearance': 44.416667, 'Patient Active Medication': 'Coumadin'}, 'Executed Rule': ('Check Drug Interaction', 'WarnAboutDrugInteraction', 'Interaction-1')}, 'status': {}}, {'Test ID': 2, 'data': {'Encounter Diagnosis': 'Acute Sinusitis', 'Patient Age': 65, 'Patient Creatinine Level': 1.8, 'Patient Creatinine Clearance': 48.03, 'Patient Weight': 83}, 'newData': {'Result': {'Encounter Diagnosis': 'Acute Sinusitis', 'Recommended Medication': 'Amoxicillin', 'Recommended Dose': '250mg every 24 hours for 14 days', 'Warning': None, 'Error Message': None, 'Patient Age': 65.0, 'Patient Weight': 83.0, 'Patient Allergies': None, 'Patient Creatinine Level': 1.8, 'Patient Creatinine Clearance': 48.032407, 'Patient Active Medication': None}, 'Executed Rule': ('Check Drug Interaction', 'WarnAboutDrugInteraction', 'Interaction-2')}, 'status': {}}, {'Test ID': 3, 'data': {'Encounter Diagnosis': 'Diabetes', 'Patient Age': 27, 'Patient Creatinine Level': 1.88, 'Patient Weight': 110}, 'newData': {'Result': {'Encounter Diagnosis': 'Diabetes', 'Recommended Medication': None, 'Recommended Dose': None, 'Warning': None, 'Error Message': 'Sorry, this decision service can handle only Acute Sinusitis', 'Patient Age': 27.0, 'Patient Weight': 110.0, 'Patient Allergies': None, 'Patient Creatinine Level': 1.88, 'Patient Creatinine Clearance': None, 'Patient Active Medication': None}, 'Executed Rule': ('Create Message', 'ErrorMessage', 'Error-1')}, 'status': {}}]

NOTE

You can keep the Test worksheet in a separate workbook and load it after you have loaded the rules

status = dmnRules.loadTest('TherapyTests.xlsx')

Documentation: More details can be found at readthedocs