hpcc-systems / DataPatterns

HPCC Systems ECL bundle that provides some basic data profiling and research tools to an ECL programmer
3 stars 4 forks source link

Request: Easy method for analyzing different profiling results #11

Open dancamper opened 6 years ago

dancamper commented 6 years ago

Satisfy this scenario: Profiling is used to analyze new data that will be ingested. Profiling results are saved as a logical file. Then, a new batch of data arrives and is profiled. The new method should compare the new profiling results with the old and output a summary of any differences.

The end goal is to highlight significant differences between the two profiles, which could indicate a significant or unexpected change in the incoming data stream.

RichardTaylorHPCC commented 6 years ago

JOIN the two using ROWDIFF in the TRANSFORM would be a start :)

dcamper commented 5 years ago

This is probably more appropriate as a stand-alone function rather than incorporated into Profile(), as it is file-based rather than field-based.