groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development
MIT License
6.76k stars 699 forks source link

Would you consider a pull request for TabularData DataFrame read/write support? #1126

Closed jackpal closed 2 years ago

jackpal commented 2 years ago

I was playing around with extending iOS 15's TabularData Framework DataFrame class to read/write from SQL.

The idea would be to capture the tabular data output of a "select" statement into a DataFrame, and also to use a DataFrame as a binding to an "insert", or similar statement.

import GRDB
import TabularData

... create db connection...

let dataFrame = try DataFrame(connection: db, statement:"select rowid, description, done, date from tasks order by rowid;")

   // Print the dataFrame, just for fun
   print(dataFrame)

   // Prints:
   //   ┏━━━┳━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   //   ┃   ┃ rowid ┃ description ┃ done   ┃ date                      ┃
   //   ┃   ┃ <Int> ┃ <String>    ┃ <Bool> ┃ <Date>                    ┃
   //   ┡━━━╇━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
   //   │ 0 │     1 │ Walk dog    │ false  │ 2022-01-04 15:30:12 +0000 │
   //   │ 1 │     2 │ Drink milk  │ false  │ 2022-01-04 15:30:12 +0000 │
   //   │ 2 │     3 │ Write code  │ false  │ 2022-01-04 15:30:12 +0000 │
   //   └───┴───────┴─────────────┴────────┴───────────────────────────┘

I was originally doing this using raw sqlite3 APIs, but I ran into the "how to convert custom types to/from sql types" issue, and I see that GRDB has a well developed system for converting objects.

FWIW there's a second almost-identical type in https://developer.apple.com/documentation/createml/mldatatable -- these are both Apple-based copies of the Python ecosystem's pandas data frames.)

groue commented 2 years ago

Hello again @jackpal!

I'm not familiar at all with the TabularData framework, but this looks like a quite interesting project!

Do you really need to embed such support right into GRDB? Can't it live in a separate library? Did you meet any issue in your initial attempts?

jackpal commented 2 years ago

I wrote it as a SPM library, using my own sql type encoding. But I haven't been able to add a dependency on GRDB. I ran into the issues discussed in #1125

My options for a stand-alone library seem to be:

  1. Don't support SPM. Instead, support legacy package managers.
  2. Give up on depending on GRDB and continue to implement my own sql type encoding/decoding.
  3. Give up on depending on GRDB and vendor GRDB's sql type encoding/decoding into my library.
  4. Depend on a forked SPM-compatible version of GRDB.

None of those seem like perfect options, so I'm not sure what to do.

jackpal commented 2 years ago

Also even is making a separate library worked, it seems to me that GRDB strives to be a "Batteries included" type of library, that supports all the standard Apple APIs out of the box. So I wanted to check with you first before continuing with a separate library.

JayToltTech commented 2 years ago

@jackpal lol, we seem to be working on much the same thing. Great minds and all that :-). Please let me know if you have a sharable version of this.

@groue I'm really enjoying using SQLite for my structured data repository for ML data, and Apple's ML toolchain is so much cleaner than my previous work in a Python/Pytorch environment. I enthusiastically endorse a GRDB/TablularData bridge.

jackpal commented 2 years ago

@JayToltTech Yeah, it's great fun, even if it's probably not very efficient. I have to get permission from my company to release this code, so it may take a few weeks before I can do it. It's currently just a few hundred lines, but it only supports standard column types. It would grow larger if it had a full GRDB-style sql encoding / decoding type support.

groue commented 2 years ago

Also even is making a separate library worked, it seems to me that GRDB strives to be a "Batteries included" type of library, that supports all the standard Apple APIs out of the box. So I wanted to check with you first before continuing with a separate library.

I understand this point of view, but it is not entirely accurate. Are embedded features that (1) can't be built from other public apis, and (2) general-purpose features that are not trivial to build correctly.

In the context of TabularData, we do not meet the second criteria. GRDB Combine support used to be separated (until it was clear Combine was a general-purpose feature). SwiftUI support currently lives in another library, as well as RxSwift support.

So I do recommend that you work on TabularData support in a separated library. If you meet difficulties, I'll help you. If you meet missing features, we'll discuss the best way to make progress.

lol, we seem to be working on much the same thing

That's funny indeed :-)

jackpal commented 2 years ago

OK, sounds good!

As a separate library author, I also am considering if there's any way to make the library reusable from multiple Swift SQL libraries. (i.e. support both SQLite and GRDB). Unfortunately the lack of a standardized Swift SQL interface for custom sql type encoding/decoding makes that difficult.

groue commented 2 years ago

All right! So let's close this issue. Just open a new one when you feel like it :-)

Happy GRDB!

jackpal commented 2 years ago

As a follow up, I wanted to mention I published the first version of my library at https://github.com/jackpal/SQLiteDataFrame

I haven't had a chance to figure out how best to hook it up to GRDB yet. It may be interoperable at the SQLite database connection level, but I'm not sure about that.

groue commented 2 years ago

GRDB exposes Raw SQLite Pointers, maybe that's just what you need.