Support SQLite3 as a data source for scanning in-memory data

vitormussa commented 2 years ago

Is your feature request related to a problem? Please describe. In the context of ELT oriented projects, it is important to test data in every step of the pipeline. In this sense, if we have a pure Python step for extracting data from a data source, we should be able to test it before loading it into a database.

Describe the solution you'd like A possible solution is to support SQLite3, as it can be instantiated as an in-memory database with Python. It would be a better solution than using some other engine like Pandas that doesn't support SQL. Also, SQLite3 is a widely used database engine and has a built-in Python API. Although it is not recommended for production, it could fit as a storage for small datasets.

Additional context Instatiating a SQLite3 database in memory with Python is a simple task:

import sqlite3
conn = sqlite3.connect(':memory:')

Then we can run soda-sql scans against it to test the data before sending it downstream.

vijaykiran commented 2 years ago

Hey @vitormussa !

Thank you for opening the issue - I did start with a simple SQLite implementation sometime ago. There are some limitations though because by default SQLite doesn’t come with math functions, unless you change the compilation settings.

Feel free to check this branch https://github.com/sodadata/soda-sql/tree/sqlite-dialect and if you want to take a stab at it - I’ll be happy to help you with making it complete :slightly_smiling_face:

vitormussa commented 2 years ago

Cool, @vijaykiran! I started looking at the other dialects and was wondering if I could start developing the SQLite one. I'll look at this branch and see what I can do then. Thanks for the fast answer :)

sodadata / soda-sql

Support SQLite3 as a data source for scanning in-memory data #197