Add Input Checks for pre_aggregated

Should be iterable
Number of columns and types should match what's expected
Allow pre_aggregated on execute_df; error or warning if pandas passed in (or maybe just work?)

Spark DataFrames and RDDs might not be executed until the caller requests rows from the result of execute(). Pulling the first row of data to check types, then re-running for the map() to produce output, would result in the query used for pre_aggregated being run twice, which could be very expensive. One way to avoid double-execution would be to do the type and column checking inside the row map.

The implementation currently ignores column names on pre_aggregated, because the column names in the typical case are generated by the private reader (including names with random strings), and all values are extracted from the subquery result recordset positionally. This could lead to errors if the caller passes in pre-computed aggregates in a different order than what the private reader is expecting (e.g. correct number of columns, and both integer, but they are swapped). This would be tricky for the caller to debug. And, since the values being passed in are pre-aggregated, we have no way of checking to see what expression was used to compute each column. However, we can use some heuristic to compare passed-in column names with the names that would've been used in the typical case, and throw error or warning if they don't match.

opendp / smartnoise-sdk

Add Input Checks for pre_aggregated #421