[YCQL] YCQLSH noticeably slows down output when outputting large jsonb column values that contain certain characters

tanujnay112 commented 2 years ago

Description

Consider the following schema on YCQLSH

CREATE KEYSPACE sample_key;
USE sample_key;
CREATE table samplezeros(h int primary key, j jsonb);
CREATE table sampletwos(h int primary key, j jsonb);

COPY samplezeros(h, j) from '/tmp/yb/testzeros.csv';
COPY sampletwos(h, j) from '/tmp/yb/testtwos.csv';

testzeros.csv is a CSV with the first column being whole numbers and the second column is a json document with one key value pair. The value of this is very large (up to 10KB). Both the key and values only contain zeros. testtwos.csv is the exact same except it contains twos instead of zeros.

Now note that selecting from samplezeros takes a much longer time than from sampletwos through ycqlsh.

time ./bin/ycqlsh -e "paging off; use sample_key; select * from sampletwos"
<<data>>
real    0m1.146s
user    0m0.469s
sys 0m0.260s

time ./bin/ycqlsh -e "paging off; use sample_key; select * from samplezeros"
<<data>>
real    0m7.737s
user    0m6.956s
sys 0m1.442s

This difference does not appear when we use this python script to access the data instead: cqltest.zip

time python2.7 ./cqltester.py --twos
<<data>>
real    0m0.630s
user    0m0.157s
sys 0m0.141s

time python2.7 ./cqltester.py --zeros
<<data>>
real    0m0.619s
user    0m0.153s
sys 0m0.137s

This suggests that there is some slowdown when certain characters are involved in a column value at the YCQLSH level.

tanujnay112 commented 2 years ago

It appears that this is due to this line of formatting code that is meant to replace unicode control characters at the YCQLSH level through expensive regex operations. However, the list of unicode control characters here seems too liberal. This says that characters such as 0 and . need to be replaced even though in reality they can be cleanly displayed in YCQLSH. It would seem that real control characters would start from 0x00 to 0x1F according to this. After fixing this locally, the issue seems to go away.

kmuthukk commented 2 years ago

excellent find/debugging @tanujnay112

tanujnay112 commented 2 years ago

I am observing the same timing disparity with the main CQLSH repo.

yugabyte / cqlsh

[YCQL] YCQLSH noticeably slows down output when outputting large jsonb column values that contain certain characters #9

Description