scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.59k stars 1.29k forks source link

Expose toppartitions as CQL table #5727

Open tzach opened 4 years ago

tzach commented 4 years ago

https://github.com/scylladb/scylla/issues/2811 expose top (hot) partitions via REST API and nodetool command. Other similar functions, like large row #3988 and large partitions #4234 are expose via CQL table It will be useful to align top partitions info to the same method.

tzach commented 4 years ago

Similar to #5726

slivne commented 4 years ago

Since we are providing multiple interfaces in scylla thrift/cql/alternator/redis - it is not clear if selecting an interface that is unique in capability to expose more information is the correct approach. At the end we will want users (no matter which protocol they are using) to be able to detect hot partitions or large partitions.

An alternate approach would be to expose all of these over REST and not make them avail over a CQL table (Alternator/Redis will not have a way to access this information).

@tzach ^^

tzach commented 4 years ago

Good point, both REST and CQL have advantages:

CQL

REST

Easier to consume by mgmt applications

amnonh commented 4 years ago

I believe that it should be done in a systematic way. i.e. create a mapping between the API (based on its swagger definition) to virtual table/s, so it will be available through all the protocols.

Both Alternator and Redis and potential other protocols, should have access to tables, so we should implement it in a way that an alternator/redis/cql client will be able to access those data.

In general, I think that users should be able to see and do everything using the protocol they are connecting to the database.

slivne commented 4 years ago

Amnon's suggestion whale its not is not possible

Alternbaotr users will not be able to query those tables - they are able to query specific tables with specific structure.

The same probably goes for Redis (yet I am not an expert on that).

The solution could be Rest + CQL.

On Mon, Feb 17, 2020 at 9:14 AM Amnon Heiman notifications@github.com wrote:

I believe that it should be done in a systematic way. i.e. create a mapping between the API (based on its swagger definition) to virtual table/s, so it will be available through all the protocols.

Both Alternator and Redis and potential other protocols, should have access to tables, so we should implement it in a way that an alternator/redis/cql client will be able to access those data.

In general, I think that users should be able to see and do everything using the protocol they are connecting to the database.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/5727?email_source=notifications&email_token=AA2OCCA3GWCSSYP3K3IB4FDRDI2MTA5CNFSM4KQLKAKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL5JVKY#issuecomment-586848939, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2OCCDQ6WBPZUWBTW6ECN3RDI2MTANCNFSM4KQLKAKA .

tzach commented 4 years ago

This is even more of an issue where nodetool is not available, for example Scylla Cloud. A virtual table will solve this issue.

tzach commented 3 years ago

For DynamoDB a similar function is available via Cloud Watch https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/contributorinsights_HowItWorks.html

tzach commented 3 years ago

Ping

slivne commented 3 years ago

@tzach its not clear whats the ping for ....

tzach commented 3 years ago

Exposing top partitions info in a virtual table, similar to https://github.com/scylladb/scylla/commit/7a3930f7cfca1fd01180e5e83dea9d097558ae22

slivne commented 3 years ago

The virtual tables that have been exposed are ones that do not require an operation from the user end.

The ones requiring an operation are not yet exposed - the milestone for this feature is 4.x - e.g. not scheduled if we what to schedule it we need to also take something out ...

On Sun, Nov 14, 2021 at 8:52 AM Tzach Livyatan @.***> wrote:

Exposing top partitions info in a virtual table, similar to 7a3930f https://github.com/scylladb/scylla/commit/7a3930f7cfca1fd01180e5e83dea9d097558ae22

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/5727#issuecomment-968217908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2OCCDGQWQPGZ5TX6EG7HDUL5MCTANCNFSM4KQLKAKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

avikivity commented 3 years ago

These should be done using the CQL EXEC or CALL statement, since we need to provide parameters. A table is good for selecting data that doesn't depend on anything, except perhaps the current time.

tzach commented 3 years ago

Can the DB build an ad-hoc "table" as a result of SELECT? (I agree it might be misused)

avikivity commented 3 years ago

That's what virtual tables do. But virtual tables don't have parameters.

amnonh commented 3 years ago

Theoretically, it can be taken from the WHERE part

avikivity commented 3 years ago

It's a terrible interface.

amnonh commented 3 years ago

I prefer a command in this case, but something like: SELECT rank, partition, table, keyspace from system.toppartition WHERE keyspace='keyspace1' and table='table1' ORDER BY rank DESC limit 10;

avikivity commented 3 years ago

What about the time-to-sample? And memory allowance to allocate?

amnonh commented 3 years ago

What about the time-to-sample? And memory allowance to allocate?

That's why a command is better, this is very artifficial but you can force that to be part of the condition

haaawk commented 3 years ago

Since we are providing multiple interfaces in scylla thrift/cql/alternator/redis - it is not clear if selecting an interface that is unique in capability to expose more information is the correct approach. At the end we will want users (no matter which protocol they are using) to be able to detect hot partitions or large partitions.

An alternate approach would be to expose all of these over REST and not make them avail over a CQL table (Alternator/Redis will not have a way to access this information).

@tzach ^^

@StarostaGit is working on this

haaawk commented 3 years ago

@StarostaGit has been working on this for some time and my impression was that all the details are already settled. @StarostaGit what were the design decisions made previously?

StarostaGit commented 3 years ago

Yes, there is #8854 addressing CQL CALL statements and this issue. As @avikivity mentioned above, it was meant to be done through the CALL statements as opposed to VTs - it seems to fit our case very well since toppartitions are, after all, a function call.

The PR was not picked up when I was gone for the summer, hence the delay, but I'm working on it again - it is mostly done, along with the toppartitions, all that's left is some more tests and some feedback left under the PR.

tzach commented 1 year ago

@amnonh FYI