Closed Stinger911 closed 10 years ago
I had meditate for long time over the query and found that it totally incorrect. In my case there are more then 1000 records per partition key. My primary key is compound key. By this way the query will produce data loss, because it returns only first 1000 records per partition key. It's a bug.
Filtering on Partition key is currently not supported. This is a limitation in Cassandra CQLPagingInputFormat. See the Cassandra jira#6151.
It should be fixed when Cassandra issue#6311 comes in.
About what you mentioned in the second comment, this is the CQL3 Paging approach. If your partition key has more than 1000 CQL rows, the driver will page through them using the next part of the Compositee Key. You can read more about it in the "CQL3 pagination" section here
Thank you for the reply.
I'll be waiting for the updates.
About pagination - I can't receive more than limited number of rows through Cql3CasBuilder nor ThriftCasBuilder. In Cql3CasBuilder I may set the pageSize parameter. In case of use ThriftCasBuilder I can't receive more than 100 columns for row key. I understand, that I may used the Caliope in improper way. Can you point me to documentation which may help me to fix my bugs.
Any news on this? It seems like Cassandra issue#6311 is already integrated in version 2.0.7. However, the above is still not working as single partition where clauses are translated into invalid cql queries.
@igorper Unfortunately it was not solved by the new reader as the query approach still remains the same as earlier and so we are not able to restrict it on partition key. We are working out alternate approach for the same.
Till then there are 2 options -
I am playing with Calliope in our existing log table in Cassandra 2.0.3. I like the simplicity and results of the library. But I found that I can't always create query with WHERE clause with CasBuilder. If I supply the columns for example: val cas = CasBuilder.cql3.withColumnFamily("testing", "event").where("hour = '2014030315'")
then I get a runtime exception when running count on rdd. When I turned on the debug messages in spark, it showed CQL generated looks like the following:
SELECT * FROM "event" WHERE token("hour") > ? AND token("hour") <= ? AND hour = '2014030315' LIMIT 1000 ALLOW FILTERING
My schema is: desc table testing.event CREATE TABLE event ( hour text, stamp timeuuid, taxonomy map<text, text>, PRIMARY KEY (hour, stamp) ) WITH CLUSTERING ORDER BY (stamp DESC)
Is it possible to filter query by primary key with Calliope?