CONCLUSION
So, in good conscience, I cannot recommend methods (API functions or PRAGMAS)
that are not subject to the rigorous SQLite3 testing process and not even
guaranteed to be this release, let alone, future releases.
QUESTIONS/CONCERNS
To Scrape or Not to Scrape TABLE/VIEW Metadata?
Re-parsing SQLite SQL considered harmful.
Is there a reliable API alternative?
ANALYSIS
Currently dbtools obtains SQL TABLE metadata (column names, column types,
primary key and whether the primary key auto-increments) by querying and scraping
the internal sqlite_Master table for the "sql" string that created the table.
# SQL "cmd" string to obtain "sql" creation string from the sqlite_master table
cmd = ("SELECT sql FROM sqlite_master "
"WHERE tbl_name='%s' and type='table'" % name)
# Execute the SQL "cmd" string
info = sql_execute(self.db, cmd, fetchall=True, verbose=verbose)
# parse the response -- it will look like
# 'CREATE TABLE name(col1 TYPE, col2 TYPE, ...)'
args = re.match("([^\(]*)\((.*)\)", info[0][0]).groups()[1]
If dbtools is modified to accept VIEWs in place of TABLEs, the "sql"
string for creating a VIEW can be much more convoluted, in fact
the "sql" creation string for a VIEW is likely to be a CREATE VIEW AS SELECT ...
statement that spells out a lot of ugly stuff specifically so it can be hidden
from that point forward.
The four table meta data items are:
column names
column types
primary key
whether the primary key auto-increments
Scraping for #1, column names and #2, column types
from the parsed version of the "sql" creation string (args =) above:
# get the column names [and types]
cols = [a.strip() for a in args.split(",")]
self.columns = tuple([x.split(" ")[0] for x in cols])
Scraping for #3, the primary key:
# parse primary key, if any
pk = [bool(re.search(r"PRIMARY KEY", x)) for x in cols]
primary_key = np.nonzero(pk)[0]
if len(primary_key) > 1:
raise ValueError("more than one primary key: %s" % primary_key)
elif len(primary_key) == 1:
self.primary_key = self.columns[primary_key[0]]
else:
self.primary_key = None
Scraping for #4 does the primary key auto-increment?
parse autoincrement, if applicable
ai = [bool(re.search(r"AUTOINCREMENT", x)) for x in cols]
autoincrement = np.nonzero(ai)[0]
if len(autoincrement) > 1:
raise ValueError("more than one autoincrementing "
"column: %s" % autoincrement)
elif self.primary_key is not None and len(autoincrement) == 1:
if self.primary_key != self.columns[autoincrement[0]]:
raise ValueError("autoincrement is different from primary key")
self.autoincrement = True
else:
self.autoincrement = False
The Python packages sqlite, apsw and sqlalchemy expose a portion of the
SQLite3 C API functions. If the metadata is not exposed in the C API
then the wrapper functions must be doing their own scraping and blindly
trusting wrapper scraping is not an improvement over our own scraping.
On the other hand the SQLite3 C API functions should go through the
rigorous SQLite3 testing process.
http://www.sqlite.org/testing.html
The SQLite3 C API Interface has functions for #1 column names and #2 column types.
Column Names In A Result Set
const char *sqlite3_column_name(sqlite3_stmt*, int N);
const void *sqlite3_column_name16(sqlite3_stmt*, int N);
These routines return the name assigned to a particular column
in the result set of a SELECT statement.
...The name of a result column is the value of the "AS" clause for that column,
if there is an AS clause. If there is no AS clause then the name of the column
is unspecified and may change from one release of SQLite to the next.
http://www.sqlite.org/c3ref/column_name.html
Result Values From A Query
...
int sqlite3_column_type(sqlite3_stmt*, int iCol);
...
The sqlite3_column_type() routine returns the datatype code for the initial
data type of the result column. The returned value is one of SQLITE_INTEGER,
SQLITE_FLOAT, SQLITE_TEXT, SQLITE_BLOB, or SQLITE_NULL. The value returned
by sqlite3_column_type() is only meaningful if no type conversions have
occurred as described below. After a type conversion, the value returned
by sqlite3_column_type() is undefined. Future versions of SQLite may change
the behavior of sqlite3_column_type() following a type conversion.
http://www.sqlite.org/c3ref/column_blob.html
This assumes we want the final aliased "AS" names in a result set; there is
also in the SQLite3 C API:
"a means to determine the database, table, and table column
that is the origin of a particular result column in SELECT statement."
http://www.sqlite.org/c3ref/column_database_name.html
The primary key and whether it is autoincremented is a bit more of a challenge.
The SQLite3 C API has a function sqlite3_table_column_metadata() which for a
specific database, table and column name combination returns data about that column:
Metadata is returned by writing to the memory locations passed
as the 5th and subsequent parameters to this function. Any of these
arguments may be NULL, in which case the corresponding element of
metadata is omitted.
Parameter Output
Type Description
--- ----------- -----------
5th const char* Data type
6th const char* Name of default collation sequence
7th int True if column has a NOT NULL constraint
8th int True if column is part of the PRIMARY KEY
9th int True if column is AUTOINCREMENT
So, this function would be terrific, if it was available, but it is not even
guaranteed to be in the SQLite3 C API library, let alone in the Python apsw!
"This API is only available if the [SQLite3 C API] library was compiled
with the SQLITE_ENABLE_COLUMN_METADATA C-preprocessor symbol defined."
Keith Medcalf suggested (in the SQLite3 mailing) using PRAGMAS.
There is an ideal PRAGMA for our purpose:
PRAGMA table_info(table-name);
This pragma returns one row for each column in the named table.
Columns in the result set include the
column name,
data type,
whether or not the column can be NULL,
and the default value for the column.
The "pk" column in the result set is zero
for columns that are not part of the primary key,
and is the index of the column in the primary key
for columns that are part of the primary key.
http://www.sqlite.org/pragma.html#pragma_table_info
This PRAGMA has everything we need except whether the primary key
is autoincrementing.
Unfortunately, "Specific pragma statements may be removed and others added
in future releases of SQLite. There is no guarantee of backwards compatibility."
http://www.sqlite.org/pragma.html
SQLite3 is a great system, but there is no standard method of obtaining
column metadata in the base SQLite3 C API.
So, in good conscience, I cannot recommend methods (API functions or PRAGMAS)
that are not subject to the rigorous SQLite3 testing process and not even
guaranteed to be this release, let alone, future releases.
CONCLUSION So, in good conscience, I cannot recommend methods (API functions or PRAGMAS) that are not subject to the rigorous SQLite3 testing process and not even guaranteed to be this release, let alone, future releases.
QUESTIONS/CONCERNS To Scrape or Not to Scrape TABLE/VIEW Metadata? Re-parsing SQLite SQL considered harmful. Is there a reliable API alternative?
ANALYSIS Currently dbtools obtains SQL TABLE metadata (column names, column types, primary key and whether the primary key auto-increments) by querying and scraping the internal sqlite_Master table for the "sql" string that created the table.
If dbtools is modified to accept VIEWs in place of TABLEs, the "sql" string for creating a VIEW can be much more convoluted, in fact the "sql" creation string for a VIEW is likely to be a CREATE VIEW AS SELECT ... statement that spells out a lot of ugly stuff specifically so it can be hidden from that point forward.
The four table meta data items are:
Scraping for #1, column names and #2, column types from the parsed version of the "sql" creation string (args =) above:
Scraping for #3, the primary key:
Scraping for #4 does the primary key auto-increment?
parse autoincrement, if applicable
The Python packages sqlite, apsw and sqlalchemy expose a portion of the SQLite3 C API functions. If the metadata is not exposed in the C API then the wrapper functions must be doing their own scraping and blindly trusting wrapper scraping is not an improvement over our own scraping. On the other hand the SQLite3 C API functions should go through the rigorous SQLite3 testing process. http://www.sqlite.org/testing.html
The SQLite3 C API Interface has functions for #1 column names and #2 column types.
This assumes we want the final aliased "AS" names in a result set; there is also in the SQLite3 C API: "a means to determine the database, table, and table column that is the origin of a particular result column in SELECT statement." http://www.sqlite.org/c3ref/column_database_name.html
The primary key and whether it is autoincremented is a bit more of a challenge. The SQLite3 C API has a function sqlite3_table_column_metadata() which for a specific database, table and column name combination returns data about that column:
So, this function would be terrific, if it was available, but it is not even guaranteed to be in the SQLite3 C API library, let alone in the Python apsw!
Keith Medcalf suggested (in the SQLite3 mailing) using PRAGMAS.
There is an ideal PRAGMA for our purpose:
This PRAGMA has everything we need except whether the primary key is autoincrementing.
Unfortunately, "Specific pragma statements may be removed and others added in future releases of SQLite. There is no guarantee of backwards compatibility." http://www.sqlite.org/pragma.html
SQLite3 is a great system, but there is no standard method of obtaining column metadata in the base SQLite3 C API.
So, in good conscience, I cannot recommend methods (API functions or PRAGMAS) that are not subject to the rigorous SQLite3 testing process and not even guaranteed to be this release, let alone, future releases.