tolbertam / sstable-tools

Tools for parsing, creating and doing other fun stuff with sstables
Apache License 2.0
162 stars 31 forks source link

Add select command for #22 #25

Closed clohfink closed 8 years ago

tolbertam commented 8 years ago

wow, this is pretty amazing! I'll give it a look today. I can already think of some feature enhancements, like the ability to point at a directory and have it scan all sstables, although maybe that is better left to unix command line tools (i.e find . -name "*Data.db" -exec java -jar sstable-tools.jar select * from {}..\;).

This could lead also down the path of using these tools for restoring deleted data that hasn't been compacted away yet (i.e. run this tool to extract the data you are missing by query, and then another to import it back into a live cluster or alternatively a standalone tool that creates a new sstable with the restored data and newer timestamps). Although if we decide to pursue that we'll probably have to firm up / standardize the output format.

tolbertam commented 8 years ago

First cursory glance looks good. Will play with it this afternoon and provide feedback :+1:. Think I may add some tests too.

tolbertam commented 8 years ago

I can't seem to get 'select' to work as it doesn't seem to like that my filename has dashes in it:

java -Dsstabletools.schema=./schema.cql -jar target/sstable-tools-3.0.0-SNAPSHOT.jar select adj_close from ma-1-big-Data.db where ticker = 'YHOO'
org.apache.cassandra.exceptions.SyntaxException: line 1:24 no viable alternative at input '-1' (select adj_close from [ma]-1...)
    at org.apache.cassandra.cql3.ErrorCollector.throwFirstSyntaxError(ErrorCollector.java:101)
    at org.apache.cassandra.cql3.CQLFragmentParser.parseAnyUnhandled(CQLFragmentParser.java:80)
    at org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:512)
    at com.csforge.sstable.Query.main(Query.java:165)
    at com.csforge.sstable.Driver.main(Driver.java:21)

The CQL spec doesn't seem to like dashes in identifiers (quoted or otherwise).

EDIT: Looks like you can use it in a quoted identifier, however I had to escape my quotes at command line so it wasn't too obvious that I needed to do this. I can imagine others falling into the same trap. Did you find a way around this?

tolbertam commented 8 years ago

Hope you don't mind but I added a commit that resolved issues from my comments.

tolbertam commented 8 years ago

Some overall usability comments:

I think that is manageable as long as we document these two things well, but it would also be nice to have an alternative command that behaves in the following manner:

  1. A command line mode that accepts the path to the schema, sstable(s) and the query all as files (kind of lame to make the query a file, but don't see a way around that).
  2. An interactive shell that takes the schema and sstable file as an input. The user can then make queries like 'select * from table where blah' in the interactive shell.

This could behave like a limited version of cqlsh:

Usage: cqlsh sstable [sstable...] [-s schema] [-f file]

Options:
  -s , --schema=SCHEMA       The cql schema to use for the given sstable.  If not provided, 
                             query criteria is limited to select * with no where clause.
  -f, --file=FILE            Execute commands from FILE, then exit

I think this could use the ascii table transformer you propose in #26.