jhamrick / dbtools

Tools for interfacing with SQLite databases
Other
34 stars 4 forks source link

Import a SQL VIEW from SQLite to Pandas #14

Open JimCallahanOrlando opened 10 years ago

JimCallahanOrlando commented 10 years ago

Import a SQL VIEW from SQLite to Pandas The "SQL As Understood By SQLite" documentation clearly states:

"The CREATE VIEW command assigns a name to a pre-packaged SELECT statement. 
Once the view is created, it can be used in the FROM clause of another SELECT 
in place of a table name."
http://www.sqlite.org/lang_createview.html

Most first level Python/SQLite packages (SQLite and APSW) support SQL VIEWS, but SQAlchemy sometimes chokes when it attempts to parse a VIEW in Python rather than simply passing the entire SQL SELECT statement to the SQLite C API interface (see the SQLite mailing list thread "Views as Virtual Tables -- Command line vs. Called Interface" both the initial email and the code example by Keith Medcalf where both the SQLite and APSW libraries are tested).

The SQLite command line shell interface ".tables" dot-command returns both TABLEs and VIEWs:

The ".tables" command is similar to setting list mode then 
executing the following query:

    SELECT name FROM sqlite_master 
    WHERE type IN ('table','view') AND name NOT LIKE 'sqlite_%'
    UNION ALL 
    SELECT name FROM sqlite_temp_master 
    WHERE type IN ('table','view') 
    ORDER BY 1

http://sqlite.org/cli.html

The code example, above is similar to the Python dbtools line:

cmd = "SELECT name FROM sqlite_master WHERE type='table'"

so, this suggests that, if this line were changed to:

cmd = "SELECT name FROM sqlite_master WHERE type IN ('table','view')"

dbtools would be able to import VIEWS as well as TABLES.

Although the SQL standard calls for VIEWS to be update-able, in reality few implementations fully implement this complex and demanding feature and SQLite3 documentation flatly states that, "Views are read-only in SQLite."

The CREATE VIEW command assigns a name to a pre-packaged SELECT statement. 
Once the view is created, it can be used in the FROM clause of another SELECT 
in place of a table name.   
... You cannot DELETE, INSERT, or UPDATE a view.
Views are read-only in SQLite.  
... Views are removed with the DROP VIEW command.
http://www.sqlite.org/lang_createview.html

Because, the SQLite3 documentation clearly states that, "You cannot DELETE, INSERT, or UPDATE a view" we will need IF statements preventing a VIEW from being passed to DELETE, INSERT, or UPDATE methods in dbtools.

The IF statement test for view could be implemented as:

# if attempting to change a (read only) view, throw an error
# views are read-only in SQLite.
    if type = 'view'
        raise ValueError(
            "Not a TABLE: %s\n\n"
            "**  SQLite3 VIEWS are read-only\n"
            "**  can only update base tables")
JimCallahanOrlando commented 10 years ago

Workaround for SQLite VIEW to Panda Dataframe

If one views pages 175-176 in Wes McKinney's "Python for Data Analysis" as two methods a long method, followed by a shortcut -- the long method should still work for SQL VIEWs.

The shortcut using the pandas.io.sql() module is broken for SQL VIEWS by the (internal) switch to Python SQLAlchemy from Python sqlite. On the other hand, the long method, just using the Python sqlite and Python pandas Dataframe() should still work -- even with views.

JimCallahanOrlando commented 10 years ago

HINT: To get the DataFrame() function to work it has to be imported into the top level namespace. Follow the naming convention Wes McKinney recommends on pages 111-112 (paper edition).

from Pandas import Series, DataFrame "...Series and DataFrame are used so much that i find it easier to import them into the local namespace." page 112, "Python For Data Analysis" by Wes McKinney(O'Reilly) (c) 2013 That is in addition to the: import pandas as pd

JimCallahanOrlando commented 10 years ago

For importing a SQL VIEW I had better luck with the Python apsw ("another Python SQL Wrapper") library than the sqlite3 library. My code now works, I can import a VIEW with apsw into a DataFrame as described on page 175 (using a TABLE and the sqlite3 library) and run a crosstab on the DataFrame -- as described on page 236. Unfortunately, my code is just a proof of concept and is not ready for prime time.

JimCallahanOrlando commented 10 years ago

Clarification, the code example on page 175 of "Python For Data Analysis" by Wes McKinney(O'Reilly) (c) 2013 shows how to import a SQL TABLE to a Panda DataFrame using the Python sqlite3 library. I modified this example to import a SQL VIEW to a Panda DataFrame using the Python apsw library. Then I ran a crosstab (aka contingency or frequency table) using tips example on page 236. I ran this in IPython --Pylab; I haven't tried it yet in IPython Notebook as shown on page 73, Fig 3-4 which should produce a pretty table.