Closed tanujnay112 closed 2 years ago
It appears that this is due to this line of formatting code that is meant to replace unicode control characters at the YCQLSH level through expensive regex operations. However, the list of unicode control characters here seems too liberal. This says that characters such as 0
and .
need to be replaced even though in reality they can be cleanly displayed in YCQLSH. It would seem that real control characters would start from 0x00 to 0x1F according to this. After fixing this locally, the issue seems to go away.
excellent find/debugging @tanujnay112
I am observing the same timing disparity with the main CQLSH repo.
Description
Consider the following schema on YCQLSH
testzeros.csv is a CSV with the first column being whole numbers and the second column is a json document with one key value pair. The value of this is very large (up to 10KB). Both the key and values only contain zeros. testtwos.csv is the exact same except it contains twos instead of zeros.
Now note that selecting from
samplezeros
takes a much longer time than fromsampletwos
throughycqlsh
.This difference does not appear when we use this python script to access the data instead: cqltest.zip
This suggests that there is some slowdown when certain characters are involved in a column value at the YCQLSH level.