jupyterlite / pyodide-kernel

Python kernel for JupyterLite powered by Pyodide
https://jupyterlite-pyodide-kernel.readthedocs.io/en/latest/_static/
BSD 3-Clause "New" or "Revised" License
45 stars 23 forks source link

Error when trying to use sqlite3 #35

Open psychemedia opened 1 year ago

psychemedia commented 1 year ago

Following the sqlite3 docs at https://docs.python.org/3/library/sqlite3.html, an error is raised when trying to query a SQLite database:

# Using the example code from
#https://docs.python.org/3/library/sqlite3.html

import sqlite3
con = sqlite3.connect("tutorial.db")

cur = con.cursor()

cur.execute("CREATE TABLE movie(title, year, score)")

res = cur.execute("SELECT name FROM sqlite_master")

Throws error:

---------------------------------------------------------------------------
DatabaseError                             Traceback (most recent call last)
Cell In[2], line 11
      7 cur = con.cursor()
      9 cur.execute("CREATE TABLE movie(title, year, score)")
---> 11 res = cur.execute("SELECT name FROM sqlite_master")

DatabaseError: database disk image is malformed
image
rth commented 1 year ago

Interesting. I think it because the the current directory (/drive) is not a MEMFS (unlike the rest of the file system), but rather a DriveFS used, from what I understand, for notebook persistance. Probably it's even less POSIX compliant than MEMFS, which becomes an issue for sqlite.

If you write in any other folder say /tmp/ the above example would work.

psychemedia commented 1 year ago

If I create a folder manually (eg via right click into sidebar) and try to reference into that I get the same error. Also, saving the db to /tmp doesn't persist the database file to browser local storage, so the db would need recreating each session as any modifications made to it would not be persisted.

psychemedia commented 1 year ago

I'm guessing this issue is just carried over from when the kernel was part of the core Jupyterlite repo? https://github.com/jupyterlite/jupyterlite/issues/840

oscarpereztuta commented 1 year ago

@psychemedia leaving here two workarounds:

  1. As @rth says, create the database in the /tmp/ folder:
    
    import sqlite3

con = sqlite3.connect('/tmp/tutorial.db')

cur = con.cursor() cur.execute("CREATE TABLE movie(title, year, score)") res = cur.execute("SELECT name FROM sqlite_master") res.fetchone()


2. Create a connection to an in-memory database:

import sqlite3

con = sqlite3.connect(':memory:')

cur = con.cursor() cur.execute("CREATE TABLE movie(title, year, score)") res = cur.execute("SELECT name FROM sqlite_master") res.fetchone()

psychemedia commented 1 year ago

@oscarpereztuta Thanks... if pandas can read from/write to the in-memory db, I guess that could persist the tables as CSV.

oscarpereztuta commented 1 year ago

@psychemedia, SQLite databases opened with the sqlite3 module in Python do persist changes to disk by default.

To ensure that changes to the database are persisted to disk, you can set the journal_mode of the database connection to "WAL" (Write-Ahead Logging) mode.

Here's an example of how to set the journal_mode to "WAL" in a sqlite3 connection using the SQLite3 Tutorial:

import sqlite3
import shutil

# create a connection to a SQLite database file in the /tmp directory
con = sqlite3.connect('/tmp/tutorial.db')

# set the journal_mode to WAL (Write-Ahead Logging) for better performance and reliability
con.execute('PRAGMA journal_mode=WAL;')

# create a cursor object to execute SQL commands
cur = con.cursor()

# create a new table named "movie" with three columns: title, year, and score
cur.execute("CREATE TABLE movie(title, year, score)")

# retrieve the names of all tables in the database
res = cur.execute("SELECT name FROM sqlite_master")
res.fetchone()

# insert data into the "movie" table
cur.execute("""
    INSERT INTO movie VALUES
        ('Monty Python and the Holy Grail', 1975, 8.2),
        ('And Now for Something Completely Different', 1971, 7.5)
""")

# commit the changes to the database
con.commit()

# close the cursor and connection
res.close()
con.close()

# move the file from /tmp to /drive JupyterLite directory
shutil.move('/tmp/tutorial.db', '/drive/tutorial.db')

Then you can download the DB file from JupyterLite and see the result on a web application like this one: https://sqliteviewer.app/

psychemedia commented 1 year ago

@oscarpereztuta Thanks for that. The interaction I am trying to simplify is for a non-expert learner in a distance education setting who just wants to run a notebook in JupyterLite and have changes persisted in local browser storage that they can make use of across multiple sessions.

rth commented 1 year ago

If you want to use the home dir, someone would need to figure out what is the missing functionality currently in DriveFS required for Sqlite3 and add it. I don't know enough about this. I see there are different backends for browser storage it could be interesting to try to change the backend and see if the issue persists.

Writing the DB in a manually mounted IDBFS (persistent) folder appears to work but that's not very user friendly from a notebook perspective. DriveFS is also able to use IDBFS from Emscripten as a backend, so this does indeed sound like a bug.

michaelweinold commented 1 month ago

@psychemedia, could you check if this issue has been resolved for you?

For me, the issue seems to have gone away:

Screenshot 2024-07-30 at 08 44 12

with the versions:

# Core modules (mandatory)
jupyterlite-core==0.4.0 # https://pypi.org/project/jupyterlite-core/#history
jupyterlab==4.2.4 # https://pypi.org/project/jupyterlab/#history

# Python kernel (optional)
jupyterlite-pyodide-kernel==0.4.0 # https://pypi.org/project/jupyterlite-pyodide-kernel/#history
psychemedia commented 1 month ago

I just tried with https://jupyterlite.readthedocs.io/en/stable/_static/lab/index.html and the default Pyodide kernel in a browser with local storage cleared, and now it fails even quicker for me?

image
michaelweinold commented 1 month ago

Try our site: https://live.brightway.dev/

...and alternatively try incognito mode when using the https://jupyterlite.readthedocs.io site. I think it works for me:

Screenshot 2024-07-30 at 09 56 35
psychemedia commented 1 month ago

With the jupyterlite demo, chrome incognito, 126.0.6478.127 arm64, Mac M2 14.5:

image

I get the same error on https://live.brightway.dev/lab/index.html

michaelweinold commented 1 month ago

...yeah, ok - I had remembered that even the

import sqlite3
con = sqlite3.connect("tutorial.db")`

had thrown an error. I can confirm that

res = cur.execute("SELECT name FROM sqlite_master")

still does not work for me either.

michaelweinold commented 1 month ago

@rth, at least I was able to follow your suggestion:

I see there are different backends for browser storage it could be interesting to try to change the backend and see if the issue persists.

Now that I was able to actually choose different storage drivers (cf. https://github.com/jupyterlite/jupyterlite/issues/1152).

I can confirm that the issue persists in both the case of: asyncStorage and localStorageWrapper. The site alltogether fails to load when chosing webSQLStorage ("index.es6.js:2017 Error: No available storage method found.").

@jtpio - any idea what the underlying issue might be? We could put some resources towards a fix through UpWork (we would really love to get sqlite3 working)... but we would first have to know where to begin.