CQL3 query with WHERE can't filter on primary (partition) key

tuplejump / calliope-release

The repository for the G.A. codebase for Calliope. For the E.A. codebase request an early access to the development repo, http://tuplejump.github.io/calliope/

Apache License 2.0

48 stars 11 forks source link

CQL3 query with WHERE can't filter on primary (partition) key #5

Closed Stinger911 closed 10 years ago

Stinger911 commented 10 years ago

I am playing with Calliope in our existing log table in Cassandra 2.0.3. I like the simplicity and results of the library. But I found that I can't always create query with WHERE clause with CasBuilder. If I supply the columns for example: val cas = CasBuilder.cql3.withColumnFamily("testing", "event").where("hour = '2014030315'")

then I get a runtime exception when running count on rdd. When I turned on the debug messages in spark, it showed CQL generated looks like the following:

SELECT * FROM "event" WHERE token("hour") > ? AND token("hour") <= ? AND hour = '2014030315' LIMIT 1000 ALLOW FILTERING

My schema is: desc table testing.event CREATE TABLE event ( hour text, stamp timeuuid, taxonomy map<text, text>, PRIMARY KEY (hour, stamp) ) WITH CLUSTERING ORDER BY (stamp DESC)

Is it possible to filter query by primary key with Calliope?

Stinger911 commented 10 years ago

I had meditate for long time over the query and found that it totally incorrect. In my case there are more then 1000 records per partition key. My primary key is compound key. By this way the query will produce data loss, because it returns only first 1000 records per partition key. It's a bug.

milliondreams commented 10 years ago

Filtering on Partition key is currently not supported. This is a limitation in Cassandra CQLPagingInputFormat. See the Cassandra jira#6151.

It should be fixed when Cassandra issue#6311 comes in.

About what you mentioned in the second comment, this is the CQL3 Paging approach. If your partition key has more than 1000 CQL rows, the driver will page through them using the next part of the Compositee Key. You can read more about it in the "CQL3 pagination" section here

Stinger911 commented 10 years ago

Thank you for the reply.

I'll be waiting for the updates.

About pagination - I can't receive more than limited number of rows through Cql3CasBuilder nor ThriftCasBuilder. In Cql3CasBuilder I may set the pageSize parameter. In case of use ThriftCasBuilder I can't receive more than 100 columns for row key. I understand, that I may used the Caliope in improper way. Can you point me to documentation which may help me to fix my bugs.

igorper commented 10 years ago

Any news on this? It seems like Cassandra issue#6311 is already integrated in version 2.0.7. However, the above is still not working as single partition where clauses are translated into invalid cql queries.

milliondreams commented 10 years ago

@igorper Unfortunately it was not solved by the new reader as the query approach still remains the same as earlier and so we are not able to restrict it on partition key. We are working out alternate approach for the same.

Till then there are 2 options -

If you have a single partition key and and then you can restrict it by setting the start token and end token generated from you partition key range.
Distribute out the partition key with spark and query for individual rows in your code. List(rowkeys).parallelize.map() and so on.