Using sqlite_fdw to access databases stored in Postgresql large objects.

yairlenga commented 1 year ago

I have large number of SQLITE databases, each of them is self contained, containing multiple tables. Each of them of 1MB to 100MB, most around 10MB. Each of the databases capture simulation result.

I would like to load all of those databases into single table in Postgresql, and use FDW to extract data (READ ONLY access), without having to dump them into files. From what I read, it should be possible by extending the sqlite_fdw to use the sqlite virtual storage manager.

For my use case, The SQLITE databases are created by outside application, and will be loaded as large objects into Postgresql.

For my use case, I need to access fixed number at a time, so it be enough to have something like:

Assuming: create table sim_table (id varchar(255) primary key, data blob)

CREATE SERVER sqlite_sim1 FOREIGN DATA WRAPPER sqlite_fdw OPTIONS ( table 'sim_table' key 'foo' );

Will make the SQLITE database available.

In theory, this can be achieved by saving the data into a server side file, and using the existing FDW to open it. However, this is not practical as I need to access large number of SQLITE objects.

t-kataym commented 1 year ago

I would like to confirm your expectation.

From what I read, it should be possible by extending the sqlite_fdw to use the sqlite virtual storage manager.

What do you mean “sqlite virtual storage manager”?

For my use case, I need to access fixed number at a time, so it be enough to have something like:

Assuming: create table sim_table (id varchar(255) primary key, data blob) CREATE SERVER sqlite_sim1 FOREIGN DATA WRAPPER sqlite_fdw OPTIONS ( table 'sim_table' key 'foo' ); Will make the SQLITE database available.

This query does not give the information about file path of sqlite database at all. How should sqlite_fdw determine target sqlite databases?

mkgrgis commented 1 year ago

This query does not give the information about file path of sqlite database at all. How should sqlite_fdw determine target sqlite databases?

@t-kataym , I know what is "sqlite virtual storage manager": @yairlenga means there is no path to file, there is relational bytea in PostgreSQL represents SQLite database readonly image. This exotic location represented by such database option as

table 'sim_table',
key 'foo'

mkgrgis commented 1 year ago

son-phamngoc commented 1 year ago

@mkgrgis Thank you for your support. @yairlenga I would like to confirm your expectation.

In my understanding, you want to load all binary data of SQLite databases into sim_table of Postgres. Each record of sim_table contains id of the database and a bytea data which represents binary data of corresponding database. The id can be specified as an option of FOREIGN TABLE when creating it. CREATE SERVER sqlite_sim1 FOREIGN DATA WRAPPER sqlite_fdwOPTIONS (table 'sim_table’, key 'foo'); CREATE FOREIGN TABLE sqlite_table1 (c1 int, c2 text) SERVER sqlite_sim1 OPTIONS (id ‘SQLite 1’, table ‘table1’);

When selecting data, basically, SQLite FDW follows this workflow:

Firstly, SQLite FDW queries to sim_table to get corresponding data, based on the specified id. SELECT data from sim_table WHERE id = ‘SQLite 1’;
SQLite FDW loads binary data into 'somewhere' (RAM, in-memory database of SQLite ...). I think we need to consider some specific processing for binary data. For example, SQLite provides API sqlite3_deserialize to deserialize binary data and load into in-memory database. However, it requires the 1st parameter as an opening connection, which we do not have in this case. Opening an "in-memory connection" and pass into it can be an option. _sqlite3open(":memory:", &db);
SQLite FDW connects to virtual database and query to target table.
SQLite FDW returns result.

Does this workflow match with your expectation? Currently, we have no plan to develop this feature.

yairlenga commented 1 year ago

Workflow as above, with one minor change to step 2 above:

Avoid deserializing the database into memory
Instead, just read required/minimal meta data (e.g., schema) into memory
When running queries, use the Postgresql BLOB API - read only what's needed

Motivation is performance - the SQLITE database can be big relative the size of needed data. Having to deserialize sqlite database with few MB of data to retrieve one row can kill performance. Hoping to have a solution that is effective for extracting small amount of data

Other than than - exactly what I'm looking for.

On stack overflow, users have mentions using the substring function to efficiently extract partial data from a blob.

son-phamngoc commented 1 year ago

Thanks for your explanation. We understood your expectation. However, for now, we do not have any plan to support this feature. We are not sure whether it is feasible, and whether this approach can really improve the performance.

pgspider / sqlite_fdw

Using sqlite_fdw to access databases stored in Postgresql large objects. #72