Closed Tiihott closed 11 months ago
pth_06 uses wildcards in query strings. Cassandra does not support wildcards natively, to use wildcards properly in Cassandra requires the use of full text search engine like SOLR which Datastax DSE uses alongside Cassandra.
Support for LIKE queries (and use of wildcards) in Cassandra is possible to achieve with SASI (SSTable Attached Secondary Index). https://docs.datastax.com/en/developer/java-driver/4.17/manual/query_builder/schema/index/ https://docs.datastax.com/en/developer/java-driver/4.3/manual/query_builder/relation/ https://cassandra.apache.org/doc/latest/cassandra/cql/SASI.html http://www.doanduyhai.com/blog/?p=2058#sasi_perf_benchmarks
After testing SASI in the pth_06 Cassandra-branch it seems like SASI is a viable option for implementing wildcard search to the Cassanda queries in an almost identical way that they are implemented in pth_06 mariadb and s3 queries. Because of limitations of SASI and Cassandra that are stated in above linked sasi_perf_benchmarks, it is not recommended to use SASI indexing (CONTAINS mode) on columns with long strings like the payload column because of performance and disk space usage issues. It is also recommended to avoid using substring search in Cassandra queries in general.
If using SASI is not an option because of Cassandra cluster configuration etc, the Cassandra condition walker has to exclude the wildcard usage when constructing the query condition (a list of cql relations that can be appended to the where clause of the cql query).
Missing features of Cassandra that are needed to be implemented in the software side:
This may change in the future releases of Cassandra, but for now OR and '!=' are not supported: https://cassandra.apache.org/doc/stable/cassandra/cql/SASI.html#limitations-and-caveats
Not Equals and OR support have been removed in this release while changes are made to Cassandra itself to support them.
Looking into Apache Druid as an alternative to Apache Cassandra, because of the limitations of Cassandra which are stated above. Druid should have much better filtering/search functionality that Cassandra lacks.
First create a proof of technology for implementing Cassandra querying to pth_06. If all the requirements for the pth_06 functions that will use Cassandra querying are met, then continue to implement the Cassandra querying to pth_06.