Open tedyu opened 3 years ago
I included a third id value for the IN clause. Here is snippet with the end of second value and beginning of 3rd value:
2021-06-19 21:45:18,594 (Time-limited test) [INFO - org.yb.loadtest.TestJsonUpsert.existingKeysUpdate(TestJsonUpsert.java:259)] json str 8:[3,ABC,USA,{"Module":"FM","code":"55","phone":"1400"}]
2021-06-19 21:45:18,594 (Time-limited test) [INFO - org.yb.loadtest.TestJsonUpsert.existingKeysUpdate(TestJsonUpsert.java:259)] json str 9:[3,ABC,USA,{"Module":"FM","code":"55","phone":"1400"}]
2021-06-19 21:45:18,594 (Time-limited test) [INFO - org.yb.loadtest.TestJsonUpsert.existingKeysUpdate(TestJsonUpsert.java:259)] json str 10:[5,SAM,INDIA,{"call":"75675655","code":"91"}]
2021-06-19 21:45:18,594 (Time-limited test) [INFO - org.yb.loadtest.TestJsonUpsert.existingKeysUpdate(TestJsonUpsert.java:259)] json str 11:[5,SAM,INDIA,{"call":"75675655","code":"91"}]
Meaning, only the first value has one row returned, subsequent values have duplicate rows returned.
When there is IN predicate, Executor::WhereClauseToPB() would turn on is_multi_partition (due to multiple values for IN predicate).
Executor::ExecPTNode() in turn would handle unread partitions. At the end of AdvanceToNextPartition():
req->clear_hash_code();
req->clear_max_hash_code();
so when handling subsequent partitions, there is no hash code / max hash code set (no corresponding token range).
This can explain why the token range specified by SELECT statement is not effective.
I was debugging TestJsonUpsert (involving Spark connector) where the table is defined as:
Here is the query:
There are 9 partitions for the table. Spark connector would send read requests to the partitions, specifying token range. What I observed is that I get one row for id=2. However, for id=3, there are still 9 rows coming back (with same content).
There are 18 lines of log with 'Fetching data for range'.
Below is log snippet beginning with log from Spark connector, including the additional log from eval_where.cc This is the 10th occurrence among the 18 (which should be the first for id=3) I looked at data fetching log lines before and after this, the partition range is present.