verdict-project / verdict

Interactive-Speed Analytics: 200x Faster, 200x Fewer Cluster Resources, Approximate Query Processing
http://verdictdb.org
Apache License 2.0
248 stars 66 forks source link

Sqlite error #363

Closed solangepaz closed 5 years ago

solangepaz commented 5 years ago

Hello, I'm trying to use an in-memory database to access the samples created by verdictdb. But when I try to use it I have this error: Exception in thread "main" java.lang.NullPointerException at org.verdictdb.commons.StringSplitter.splitOnSemicolon(StringSplitter.java:34) at org.verdictdb.connection.JdbcConnection.execute(JdbcConnection.java:167) at org.verdictdb.connection.JdbcConnection.executeQuery(JdbcConnection.java:219) at org.verdictdb.connection.JdbcConnection.getSchemas(JdbcConnection.java:235) at org.verdictdb.metastore.ScrambleMetaStore.retrieve(ScrambleMetaStore.java:386) at org.verdictdb.metastore.ScrambleMetaStore.retrieve(ScrambleMetaStore.java:372) at org.verdictdb.metastore.CachedScrambleMetaStore.refreshCache(CachedScrambleMetaStore.java:25) at org.verdictdb.VerdictContext.getCachedMetaStore(VerdictContext.java:86) at org.verdictdb.VerdictContext.<init>(VerdictContext.java:71) at Main.main(Main.java:23)

pyongjoo commented 5 years ago

@Beastjoe Would you have time to take a look into this issue?

Beastjoe commented 5 years ago

Sure. let me try to fix this.

pyongjoo commented 5 years ago

Thanks!

solangepaz commented 5 years ago

How is this?

solangepaz commented 5 years ago

I needed to use this as soon as possible ...

Beastjoe commented 5 years ago

Sorry for the delay. I am still investigating this issue. I will try to fix today.

solangepaz commented 5 years ago

ok, thank you

Beastjoe commented 5 years ago

Hi, this bug is because we haven't supported Sqlite yet. One of the reasons is Sqlite doesn't support schema, which VerdictDB is used to store temporary scrambling meta data. If you wish to use in-memory database, I recommend to use H2 database instead.

Also, I am trying to add support for Sqlite. One quick is to let Sqlite creates temporary databases instead of schemas. You can checkout to the branch joezhong-fix363. Currently, I think create scrambles and run simple aggregate queries should be fine. You can take a look.

solangepaz commented 5 years ago

Thank you. The error has disappeared, but I still can not see the sample data. I have this code:

public static void main(String[] args) throws SQLException, VerdictDBException {
        Connection conn = DriverManager.getConnection("jdbc:sqlite:C:/Users/paz_s/Desktop/tpchdb/TPC-H.db");
        String sql="SELECT * FROM lineitem";
        JdbcConnection jdbcConnection = new JdbcConnection(conn, new SqliteSyntax());
        VerdictContext verdictContext = new VerdictContext(jdbcConnection);
        verdictContext.getConnection().execute(sql);

    }

And I wanted to have access to the data that results from sampling.

Beastjoe commented 5 years ago

I am not sure what you mean by "sample data". If you want to know the data of scramble table, you need to create it first. For instance, verdictContext.getConnection().execute("create scramble lineitem_scramble from lineitem"); That will create a scramble table from lineitem. Then you can issue select * from lineitem_scramble to check the content inside the scramble table.

solangepaz commented 5 years ago

I have this error when I try to create a scramble table:

Exception in thread "main" java.lang.RuntimeException: syntax error occurred:mismatched input 'PRAGMA' expecting {SHOW, STREAM, GET, APPEND, CREATE, DESCRIBE, DROP, INSERT, SELECT, SET, USE, WITH, EXACT, REFRESH, '('}
    at org.verdictdb.sqlreader.VerdictDBErrorListener.syntaxError(VerdictDBErrorListener.java:35)
    at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:65)
    at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:564)
    at org.antlr.v4.runtime.DefaultErrorStrategy.reportInputMismatch(DefaultErrorStrategy.java:325)
    at org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:148)
    at org.verdictdb.parser.VerdictSQLParser.verdict_statement(VerdictSQLParser.java:674)
    at org.verdictdb.coordinator.ExecutionContext.identifyQueryType(ExecutionContext.java:801)
    at org.verdictdb.coordinator.ExecutionContext.sql(ExecutionContext.java:156)
    at org.verdictdb.jdbc41.VerdictStatement.execute(VerdictStatement.java:109)
    at org.verdictdb.connection.JdbcConnection.executeSingle(JdbcConnection.java:195)
    at org.verdictdb.connection.JdbcConnection.execute(JdbcConnection.java:170)
    at org.verdictdb.connection.JdbcConnection.executeQuery(JdbcConnection.java:219)
    at org.verdictdb.connection.JdbcConnection.getSchemas(JdbcConnection.java:235)
    at org.verdictdb.metastore.ScrambleMetaStore.retrieve(ScrambleMetaStore.java:386)
    at org.verdictdb.metastore.ScrambleMetaStore.retrieve(ScrambleMetaStore.java:372)
    at org.verdictdb.metastore.CachedScrambleMetaStore.refreshCache(CachedScrambleMetaStore.java:25)
    at org.verdictdb.VerdictContext.getCachedMetaStore(VerdictContext.java:87)
    at org.verdictdb.VerdictContext.<init>(VerdictContext.java:72)
    at Main.main(Main.java:28)
Beastjoe commented 5 years ago

On my computer, create scramble seems all right. You can try this

Connection verdictConn = DriverManager.getConnection("jdbc:verdict:sqlite:C:/Users/paz_s/Desktop/tpchdb/TPC-H.db");
verdictConn.createStatement().execute("create scramble lineitem_scramble from lineitem");

It will return a Connection object of VerdictDB and you can issue the query through this just like other databases. It is a more formal way to use VerdictDB.

solangepaz commented 5 years ago

It works, thank you. One question, why does verdictdb provide answers with a very big error when the scramble table is created only with part of the original data and not with 100% of the data?

Beastjoe commented 5 years ago

The reason VerdictDB uses only part of the original data is VerdictDB divides original data by block and processes data block by block. It will return the answer once the answer is 'converged'. I am not sure why the error is so big, maybe VerdictDB only processes one block and then returns the answer. Can I take a look how you issue the query?

pyongjoo commented 5 years ago

@solangepaz The error depends on the size of a scramble table (not the ratio):

  1. In the extreme case, if the original table has only 10 rows, creating a scramble table with 10% of the 10 rows (which is a single row) will lead to poor accuracy.
  2. In typical cases (for which VerdictDB is designed), if the original table has 1 billion rows, creating a scramble table with 0.1% of the 1 billion rows (which are 1 million rows) will lead to high accuracy.