limit the batch size to 8192 on both fulltext_index_scan() and fulltext_tokenize() function
In fulltext_index_scan function, create a new thread to evaluate the score in 8192 documents per batch instead of waiting for all results from SQL. It will speed up and avoid OOM in the function. However, the score will be calculated based on each mini-batch instead of complete batch. I think it doesn't matter as long as we have the correct answer.
support json_value parser
Pre-allocation of memory in fulltext_tokenize() function to avoid malloc
add monpl tokenizer repo to matrixone
bug fix json tokenizer to truncate value and increase the limit to 127 bytes
What type of PR is this?
Which issue(s) this PR fixes:
issue #20217 #20213 #20175 #20149
What this PR does / why we need it:
bug fixes for #20217 #20213 #20175