Open drkennetz opened 4 years ago
this is something that seems to be quite desirable with Pandas users. This could be a big help!
Serious question: Is it really necessary to convert SQL queries to pandas syntax (or vice versa)? Since the sqlite3 library is included in base Python (reference), you can spin up an in-memory sqlite database on command (conn = sqlite3.connect(':memory:')
), and you can pretty seamlessly pull database tables into a dataframe (pandas.read_sql()
) or write them to a dataframe (df.to_sql()
), it seems like we wouldn't need to re-invent the wheel for this.
I wrote up a POC implementation to read data from a .csv file into an in-memory (or on file) sqlite database and select it back out again. Theoretically, you could read in a number of tables, do your selects/inserts/joins, etc, then write your results back out the the .csv files. It introduces a bit of overhead, but you do get the option to use an on-disk sqlite file (for larger datasets) and you don't need the pandas dependency.
I like the idea of leveraging sqlite3 here. Good thinking @ericwburden
If that's as easy as you make it sound then we may be able to get that in during October for the first release.
Should I PR the POC so you can check it out?
I'd like some thoughts from @drkennetz on that
I think it would be good to do a PR for the POC. I checked out the repo and it looks solid. We can always review
@ericwburden could we please open a PR for this to go into development. If this is going to be an implementation we use then I cannot begin future work until this PR is in place.
@drkennetz Just did. Thanks for the reminder.
@ericwburden could you pull into dev?
@xanderyzwich @ericwburden began having some fun with this this morning and realized that I do believe we can utilize sqlite3 to create tables, but you cannot create a schema with sqlite3 https://stackoverflow.com/questions/30897377/python-sqlite3-create-a-schema-without-having-to-use-a-second-database.
However, the idea of a schema in our context is simply a top level directory, and I do think it is possible to make our own schema object.
@drkennetz that sounds good to me.
@xanderyzwich @ericwburden to comment further, I can see how these will play together. I think @ericwburden's code will be very beneficial once we have already created a schema and data in tables as it is accessing the data, but those need to be in place first. I'll be working on a schema class and a table class which inherits the schema class.
Create an SQL to Pandas converter so that SQL statements can be directly applied to pandas dataframes.
A simple example would be:
SQL:
Pandas: