singlestore-labs / singlestoredb-laravel-driver

The official SingleStore Laravel driver.
https://github.com/singlestore-labs/singlestore-laravel-driver
Apache License 2.0
224 stars 22 forks source link

Add performance support for JSON_MATCH_ANY on S2 8.0+ #58

Open harisenbon opened 1 year ago

harisenbon commented 1 year ago

Hey all,

I'm working with the S2 team, and we've run into huge performance issues when querying on multiple JSON fields in S2 (see Support ticket #27009)

Example:

WHERE
JSON_EXTRACT_STRING(`value`, 'lccontent') = 'paid' OR
JSON_EXTRACT_STRING(`value`, 'lccontent') = 'cpm' OR
JSON_EXTRACT_STRING(`value`, 'lccontent') = 'cps' OR
JSON_EXTRACT_STRING(`value`, 'lccontent') = 'cpc'

With the JSON improvements in version 8, we can replace that with something like this, which runs at 1.25s without load:

JSON_MATCH_ANY(MATCH_PARAM_STRING_STRICT() LIKE '%paid%', `value`, 'lccontent') OR
JSON_MATCH_ANY(MATCH_PARAM_STRING_STRICT() LIKE '%cpm%', `value`, 'lccontent') OR
JSON_MATCH_ANY(MATCH_PARAM_STRING_STRICT() LIKE '%cps%', `value`, 'lccontent') OR
JSON_MATCH_ANY(MATCH_PARAM_STRING_STRICT() LIKE '%cpc%', `value`, 'lccontent')

However, the Laravel driver doesn't currently support JSON_MATCH_ANY out of the box.

I've gotten it working on my system, using the following changes to SingleStore\Laravel\Query\Grammar

Functions to determine if we should use this new code:

/**
 * Get the SingleStore engine we're running on
 *
 * @param Builder $query
 * @return string
 */
private function getVersion(Builder $query)
{
    return Str::after($query->getConnection()->getPdo()->getAttribute(PDO::ATTR_CLIENT_VERSION), ' ');
}

/**
 * Whether we should do the JSON override or not (isJsonSelector & v > 8)
 * 8.0+ has a number of improved JSON functions to speed up queries
 *
 * @param Builder $query
 * @param         $where
 * @return bool
 */
private function doJsonOverride(Builder $query, $where)
{
    return $this->isJsonSelector($where['column']) &&
        version_compare($this->getVersion($query), '8.0') >= 0;
}

We then create a function to wrap the default query with JSON_MATCH_ANY, swapping in MATCH_PARAM_* for the colum:

/**
 * Wrap the returned data in a JSON_MATCH_ANY query for S2 performance
 *
 * @param Builder $query
 * @param         $where
 * @param         $originalCallable
 * @return string
 */
protected function wrapJsonMatchAny(Builder $query, $where, $originalCallable)
{
    // Swap out our column, and call the base query
    $column = $where['column'];
    $where['column'] = DB::raw('SINGLESTORE_JSON_MATCH');
    $whereSQL = call_user_func($originalCallable, $query, $where);

    // Set up our JSON Query
    // -------------------------------------------

    // Break apart the column name from the JSON keypath.
    [$field, $path] = $this->wrapJsonFieldAndPath($column);

    // TODO: Get the matching type from the value
    $matchType = 'MATCH_PARAM_STRING_STRICT()';

    // Re-assemble and return
    // -------------------------------------------
    $whereSQL = str_replace('SINGLESTORE_JSON_MATCH', $matchType, $whereSQL);
    return "JSON_MATCH_ANY($whereSQL, {$field}{$path})";
}

Finally, we apply the override to each where* function that we want support for:

protected function whereBasic(Builder $query, $where)
{
    return $this->doJsonOverride($query, $where) ?
        $this->wrapJsonMatchAny($query, $where, 'parent::whereBasic') :
        parent::whereBasic($query, $where);
}

Let me know if a PR would be preferred.

AdalbertMemSQL commented 1 year ago

Hi @harisenbon Thanks for your request. As for me, it looks a little bit weird that JSON_EXTRACT_STRING(value, 'lccontent') = 'paid' works much slower than JSON_MATCH_ANY(MATCH_PARAM_STRING_STRICT() LIKE '%paid%',value, 'lccontent'). I would do some investigations before adding this feature.

harisenbon commented 1 year ago

@AdalbertMemSQL

As for me, it looks a little bit weird that it works much slower

Yeah, I thought so too ;)

I would love to use , JSON_EXTRACT_STRING (as it gives some type casting benefits) but I have a month of tickets with support about how I should be using JSON_MATCH_ANY instead.

AdalbertMemSQL commented 1 year ago

I just had a chance to test this thing. Used 8.0.12 version of the SingleStore.

MySQL [db]> select * from t where JSON_EXTRACT_STRING(`value`, 'lccontent') = 'asdas';
Empty set (0.163 sec)

MySQL [db]> select * from t where JSON_MATCH_ANY(MATCH_PARAM_STRING_STRICT() LIKE 'asdas', `value`, 'lccontent');
Empty set (0.479 sec)

Looks like JSON_EXTRACT_STRING worked ~3 times faster. Then I tried bigger JSON values and the difference was not so significant, but even with very big JSON values, JSON_EXTRACT_STRING is slightly faster.

MySQL [db]> select * from t where JSON_EXTRACT_STRING(`value`, 'lccontent') = 'asdas';
Empty set (1.525 sec)
MySQL [db]> select * from t where JSON_MATCH_ANY(MATCH_PARAM_STRING_STRICT() LIKE 'asdas', `value`, 'lccontent');
Empty set (1.873 sec)

@harisenbon Can you please clarify, what version of SingleStore are you using and what the schema/size of your JSON data is?

Probably, this performance degradation happens only in some specific cases.