michalc / sqlite-s3-query

Python functions to query SQLite files stored on S3
MIT License
251 stars 15 forks source link

disk I/O error #15

Closed 4sachi closed 2 years ago

4sachi commented 2 years ago

Hi

I have tried to query the SQLite file present in the S3 bucket using sqlite-s3-query package but I'm getting "Exception: disk I/O error". The object is private and it is present in a private bucket. I have provided the correct access and secret keys of the user also verified that the user is able to access the object using boto3

Please find the complete details below,

version of sqlite_s3_query - sqlite-s3-query 0.0.44 the file format used in s3 bucket - filename.db, filename.sqlite region, access key id, and secret access key are stored in environment variables

sample code

from sqlite_s3_query import sqlite_s3_query s3_url="https://<bucket>.s3.<region>.amazonaws.com/<key>" with sqlite_s3_query(url=s3_url) as query: with query('SELECT * FROM table1 WHERE name= ?', params=('name1',)) as (columns, rows): for row in rows: print(row)

Error message Error observedTraceback (most recent call last): File "C:\Users\user1\Downloads\db.py", line 29, in with sqlite_s3_query(url='') as query: File "C:\Users\user1\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 117, in enter return next(self.gen) File "C:\Users\user1\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlite_s3_query.py", line 319, in sqlite_s3_query with \ File "C:\Users\user1\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 117, in enter return next(self.gen) File "C:\Users\user1\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlite_s3_query.py", line 275, in get_db run(libsqlite3.sqlite3_open_v2, f'file:/{file_name}'.encode() + b'\0', byref(db), SQLITE_OPEN_READONLY | SQLITE_OPEN_URI, vfs_name.encode() + b'\0') File "C:\Users\user1\AppData\Local\Programs\Python\Python39\lib\site-packages\sqlite_s3_query.py", line 73, in run raise Exception(libsqlite3.sqlite3_errstr(res).decode()) Exception: disk I/O error

Could you please help me with this

Thanks, 4sachi

michalc commented 2 years ago

Hello 👋

I can try to have a look, but I think I’m going to need a complete example so I can run the exact same code.

Also, do you have very specific details of the system this is running on? It looks like Windows from the paths (and I have very little experience of Windows, and I think no experience of running Python on Windows)

Michal

michalc commented 2 years ago

Let me know if you have any more details? (Otherwise will close the issue)

michalc commented 2 years ago

Closing, but feel free to reopen

ionox0 commented 1 year ago

Just FYI I am also seeing this error. I believe it occurs on 2nd attempt of downloading, after 1st attempt of downloading is cancelled. Perhaps clearing the cache would fix it

michalc commented 1 year ago

@ionox0 can you provide more details, such as a reproducible example? Specifically: which cache are you referring to?

cjusko commented 7 months ago

Hi! I'm seeing this error as well. The sqlite file that I'm attempting to access is fairly sizeable, about 3GB. However, I did have this working with the same sqlite file just a few weeks ago. My exact setup is as follows:

with sqlite_s3_query(url=s3_url, get_credentials=lambda now: ( region,access_key,secret_key,token )) as query:
        table_query = """SELECT name FROM sqlite_master WHERE type='table';"""
        with query(table_query) as (columns, rows):
            for row in rows:
                print(row)

which results in the same Exception: disk I/O error error.

cjusko commented 7 months ago

Trying to run the same table name query on a smaller, test sqlite file still results in the disk I/O error. I'm running with Python 3.10, but got the same error on Python 3.8. Versioning is enabled on the bucket.

sqlite-s3-query v0.0.78

michalc commented 7 months ago

Hi @cjusko,

Are you able to share the specific SQLite file? Also - what version of SQLite?

Thanks,

Michal

cjusko commented 7 months ago

@michalc

Thanks for getting back! Sorry about the delay. I was able to figure this out after spending some time messing with the permissions on the bucket.

For future reference, I was receiving this error because of an update our IT dept conducted which inadvertently removed s3:GetObject and s3:GetObjectVersion permissions. Adding those back resolved all issues we were having. So if this issue comes back up for anyone in the future, that's the first place I'd check.

Thanks again!

michalc commented 7 months ago

@cjusko

Ah good to know - thanks! So it probably would have caused a 403 response, which then we would convert to a exception, but then this gets converted to what I think would surface as a disk i/o error at https://github.com/michalc/sqlite-s3-query/blob/d6301d15e7b32553067c539ea79a199abccfb58d/sqlite_s3_query.py#L187

Maybe this should be better surfaced somehow...

michalc commented 7 months ago

@cjusko (and others that get here)

As of v0.0.80 sqlite-s3-query will now raise a range of more specific exceptions, including more consistently surfacing httpx exceptions, which are raised in the case of (for example) 403s from S3 in the case of not having the right permissions.

There are some details in the Exceptions section of the README