traindb-project / traindb-ml

Remote ML Model Serving Component for TrainDB
Apache License 2.0
6 stars 2 forks source link

AQP Error: AssertionError: Found multiple or no matching columns #40

Closed kihyuk-nam closed 2 years ago

kihyuk-nam commented 2 years ago

INPUT SELECT COUNT(*) FROM orders where order_id >= 2

RESULT

2022-08-24 14:25:37,310 [INFO ]  ESTIMATE Aggregations: SELECT COUNT(*) FROM orders where order_id >= 2
2022-08-24 14:25:37,311 [INFO ]   - Query: SELECT COUNT(*) FROM orders where order_id >= 2, Model: ['model/instances/ensemble_single_instacart_10000000.pkl']
2022-08-24 14:25:37,311 [INFO ]   - Show Confidence Intervals: True
2022-08-24 14:25:37,311 [INFO ]   Read the ensemble in ['model/instances/ensemble_single_instacart_10000000.pkl']
2022-08-24 14:25:37,313 [DEBUG]   - Including SPN with table_set {'orders'} with sampling ratio(3421084 / 9853060.949424399)
2022-08-24 14:25:37,314 [DEBUG]  Stats: (---Structure Statistics---
# nodes             63
    # sum nodes     0
    # prod nodes    19
    # leaf nodes    42
# params            42
# edges             43
# layers            5)
2022-08-24 14:25:37,314 [INFO ]   : <train.ensemble_compilation.spn_ensemble.SPNEnsemble object at 0x7f165b5f00d0>
2022-08-24 14:25:37,317 [INFO ]   Evaluate 'SELECT COUNT(*) FROM orders where order_id >= 2:<train.ensemble_compilation.graph_representation.Query object at 0x7f165b5f0580>'
2022-08-24 14:25:37,317 [DEBUG]   In evaluate_query
2022-08-24 14:25:37,317 [DEBUG]   query.query_type = QueryType.CARDINALITY
INFO:     127.0.0.1:54028 - "GET /estimate/?query=SELECT%20COUNT%28%2A%29%20FROM%20orders%20where%20order_id%20%3E%3D%202&dataset=instacart&ensemble_location=model%2Finstances%2Fensemble_single_instacart_10000000.pkl&show_confidence_intervals=true HTTP/1.1" 500 Internal Server Error

...

  File "main.py", line 53, in aqp_read
    value = estimate(schema, dataset, query, ensemble_location, show_confidence_intervals)
  File "main.py", line 211, in estimate
    result = evaluate_an_aqp_query(ensemble_location, query, schema, show_confidence_intervals)
  File "/home/nam/Projects/traindb/traindb-ml/evaluation/aqp_evaluation.py", line 111, in evaluate_an_aqp_query
    spn_ensemble.evaluate_query(query,
  File "/home/nam/Projects/traindb/traindb-ml/train/ensemble_compilation/spn_ensemble.py", line 768, in evaluate_query
    _, factors, cardinalities, factor_values = self.cardinality(prototype_query,
  File "/home/nam/Projects/traindb/traindb-ml/train/ensemble_compilation/spn_ensemble.py", line 1007, in cardinality
    results.append(self._cardinality_with_injected_start(query, first_spn, next_mergeable_relationships,
  File "/home/nam/Projects/traindb/traindb-ml/train/ensemble_compilation/spn_ensemble.py", line 1219, in _cardinality_with_injected_start
    values, cardinality, formula = evaluate_factors(dry_run, factors, self.cached_expecation_vals,
  File "/home/nam/Projects/traindb/traindb-ml/train/ensemble_compilation/spn_ensemble.py", line 506, in evaluate_factors
    _, exp = factor.spn.evaluate_indicator_expectation(factor, gen_code_stats=gen_code_stats,
  File "/home/nam/Projects/traindb/traindb-ml/train/aqp_spn/aqp_spn.py", line 122, in evaluate_indicator_expectation
    return self.evaluate_indicator_expectation_batch(indicator_expectation, None, None,
  File "/home/nam/Projects/traindb/traindb-ml/train/aqp_spn/aqp_spn.py", line 229, in evaluate_indicator_expectation_batch
    range_conditions = self._parse_conditions(indicator_expectation.conditions, group_by_columns=group_bys,
  File "/home/nam/Projects/traindb/traindb-ml/train/aqp_spn/aqp_spn.py", line 409, in _parse_conditions
    assert len(matching_cols) == 1 or len(matching_fd_cols) == 1, "Found multiple or no matching columns"
AssertionError: Found multiple or no matching columns

SUSPECTED CAUSE

CF. Normal case

2022-08-24 14:25:53,811 [INFO ]  ESTIMATE Aggregations: SELECT COUNT(*) FROM orders where order_dow >= 2
2022-08-24 14:25:53,812 [INFO ]   - Query: SELECT COUNT(*) FROM orders where order_dow >= 2, Model: ['model/instances/ensemble_single_instacart_10000000.pkl']
2022-08-24 14:25:53,812 [INFO ]   - Show Confidence Intervals: True
2022-08-24 14:25:53,812 [INFO ]   Read the ensemble in ['model/instances/ensemble_single_instacart_10000000.pkl']
2022-08-24 14:25:53,814 [DEBUG]   - Including SPN with table_set {'orders'} with sampling ratio(3421084 / 9853060.949424399)
2022-08-24 14:25:53,815 [DEBUG]  Stats: (---Structure Statistics---
# nodes             63
    # sum nodes     0
    # prod nodes    19
    # leaf nodes    42
# params            42
# edges             43
# layers            5)
2022-08-24 14:25:53,815 [INFO ]   : <train.ensemble_compilation.spn_ensemble.SPNEnsemble object at 0x7f165b663fd0>
2022-08-24 14:25:53,817 [INFO ]   Evaluate 'SELECT COUNT(*) FROM orders where order_dow >= 2:<train.ensemble_compilation.graph_representation.Query object at 0x7f165b663670>'
2022-08-24 14:25:53,818 [DEBUG]   In evaluate_query
2022-08-24 14:25:53,818 [DEBUG]   query.query_type = QueryType.CARDINALITY
2022-08-24 14:25:53,819 [DEBUG]         predicted cardinality: 1345755.105466
2022-08-24 14:25:53,819 [DEBUG]         computed prototypical cardinality in 0.0012866021133959293 secs.
2022-08-24 14:25:53,820 [DEBUG]   if len(query.group_bys) == 0 and confidence_intervals
2022-08-24 14:25:53,820 [DEBUG]   lower_bound: 1343995.131280017, upper_bound: 1347515.079651983
2022-08-24 14:25:53,821 [INFO ]  Result: ((1343995.131280017, 1347515.079651983), 1345755.105466)
INFO:     127.0.0.1:45282 - "GET /estimate/?query=SELECT%20COUNT%28%2A%29%20FROM%20orders%20where%20order_dow%20%3E%3D%202&dataset=instacart&ensemble_location=model%2Finstances%2Fensemble_single_instacart_10000000.pkl&show_confidence_intervals=true HTTP/1.1" 200 OK
kihyuk-nam commented 2 years ago

moved to the traindb-model