Data Parameterization and Conditional Review Logic

dubgeis commented 2 months ago

Problem description

Currently the Guardian data which is entered in the form of Verifiable Credentials requires reviews by independent auditors or VVBs. In policy/methodology structuring it should be possible to determine normal or variances of answers that would be allowable by Standards and or Auditors. Typically in other settings this would be treated as anomaly detection.

It is unclear if this requires a conditional workflow, which is possible today in the Guardian via a policy, or conditional logic for review based on a specific answer. In an ideal setting Machine Learning Models would be able to parse data for answers within the norm range or flag for additional review/rejection.

Requirements

The ability to set parameters, which may not be public on Verifiable Credential based answers within a schema.

Definition of done

Ability for an auditor or standards body to enable conditional logic, this may be adjustable even after a policy is published without required migrations.

Acceptance criteria

Auditors, and VVBs, along with Standards can submit ranges, accepted responses, and data formats that would be allowable for an answer.

rdwelle commented 2 months ago

Incorporating conditional review logic and data parameterization in the Hedera Guardian is a positive step, as it allows for flexibility in emissions tracking and reporting. Given the diverse and evolving nature of industry and regulatory standards, it’s essential that auditors have the ability to set specific parameters for auditing. While the system should assist in flagging data that falls outside these parameters, the final validation should rest with the auditors to ensure accuracy and compliance.

gautamp8 commented 2 months ago

This would certainly be helpful. Some inputs by taking example of my work of digitising GS' metered energy methodology -

Parameter Range Setting: Allow setting of expected ranges and acceptable variances. For instance, 'EG_p_d_y'(daily electricity consumption in kWh) might have an expected range of 0.5-5 kWh/day with a ±20% variance. For 'fNRB', consider a range based on regional studies, e.g., 0.3-0.7, with stricter variance limits due to its significant impact on ER calculations.
Conditional Logic: Implementing checks based on set parameters. For instance: flag if 'EG_p_d_y' exceeds the expected range or if 'EF_el_y'(grid emission factor) changes beyond an acceptable threshold, or if 'fNRB' values deviate significantly from established regional baselines.
Anomaly Detection: Using statistical methods to detect anomalies, such as sudden spikes in electricity consumption or seasonal variations(e.g. higher consumption in winter).
Verification Workflow: We could have a tiered process: automatic verification for data within ranges, flagging for manual review when outside ranges but within variance, and automatic rejection for significant outliers.
Adjustable Parameters: Allow authorized users to adjust parameter ranges without policy migrations.
Audit Trail: We must maintain a comprehensive trail of parameter changes, flagged data points, and verification decisions along with rationale.

Future work could include having ML models to predict consumption patterns and identify more complex anomalies automatically.

FWIW, here's static & dynamic MRV schema I'm planning to have for metered energy policy.

https://gist.github.com/gautamp8/fc5fc512183f71094a990c4a5d6b41bf?permalink_comment_id=5148664#gistcomment-5148664

hashgraph / guardian