Open deldesir opened 2 months ago
While I understand that the name xb.py
is trying to emulate upstream Calibre-Web's naming of db.py for metadata.db db operations, and ub.py for config/app.db db operations — conversely we might instead want a far more clear name like xklb-db-ops.py
(or some such) — so that everything's more crystal clear months/years down the road?
"Official" xklb-metadata.db schema, as derived from an IIAB Calibre-Web example, after running lb tubeadd
and lb dl
: (in other words, after having downloaded a video... e.g. in fact @deldesir downloaded a small playlist of 5 videos!)
root@box:~# sqlite3 /library/calibre-web/xklb-metadata.db
SQLite version 3.45.1 2024-01-30 16:01:20
Enter ".help" for usage hints.
sqlite> .fullschema
CREATE TABLE IF NOT EXISTS "playlists" (
[id] INTEGER PRIMARY KEY,
[time_modified] INTEGER,
[time_deleted] INTEGER,
[time_created] INTEGER,
[hours_update_delay] INTEGER,
[path] TEXT,
[extractor_key] TEXT,
[profile] TEXT,
[extractor_config] TEXT,
[extractor_playlist_id] TEXT,
[title] TEXT,
[uploader] TEXT
);
CREATE TABLE IF NOT EXISTS 'playlists_fts_data'(id INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'playlists_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS 'playlists_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
CREATE TABLE IF NOT EXISTS 'playlists_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS "media" (
[id] INTEGER PRIMARY KEY,
[time_deleted] INTEGER,
[playlists_id] INTEGER,
[size] INTEGER,
[duration] INTEGER,
[time_created] INTEGER,
[time_modified] INTEGER,
[time_downloaded] INTEGER,
[fps] INTEGER,
[view_count] INTEGER,
[path] TEXT,
[webpath] TEXT,
[extractor_id] TEXT,
[title] TEXT,
[uploader] TEXT
, [time_uploaded] INTEGER, [width] INTEGER, [height] INTEGER, [live_status] TEXT, [type] TEXT, [video_codecs] TEXT, [audio_codecs] TEXT, [subtitle_codecs] TEXT, [other_codecs] TEXT, [video_count] INTEGER, [audio_count] INTEGER, [chapter_count] INTEGER, [other_count] INTEGER, [language] TEXT, [subtitle_count] INTEGER, [download_attempts] INTEGER, [error] TEXT);
CREATE TABLE IF NOT EXISTS 'media_fts_data'(id INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'media_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS 'media_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
CREATE TABLE IF NOT EXISTS 'media_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID;
CREATE INDEX [idx_playlists_id]
ON [playlists] ([id]);
CREATE INDEX [idx_playlists_time_modified]
ON [playlists] ([time_modified]);
CREATE INDEX [idx_playlists_time_deleted]
ON [playlists] ([time_deleted]);
CREATE INDEX [idx_playlists_time_created]
ON [playlists] ([time_created]);
CREATE INDEX [idx_playlists_hours_update_delay]
ON [playlists] ([hours_update_delay]);
CREATE INDEX [idx_playlists_uploader]
ON [playlists] ([uploader]);
CREATE INDEX [idx_playlists_extractor_config]
ON [playlists] ([extractor_config]);
CREATE INDEX [idx_playlists_profile]
ON [playlists] ([profile]);
CREATE UNIQUE INDEX [idx_playlists_path]
ON [playlists] ([path]);
CREATE INDEX [idx_playlists_extractor_key]
ON [playlists] ([extractor_key]);
CREATE INDEX [idx_media_id]
ON [media] ([id]);
CREATE INDEX [idx_media_time_deleted]
ON [media] ([time_deleted]);
CREATE INDEX [idx_media_playlists_id]
ON [media] ([playlists_id]);
CREATE INDEX [idx_media_size]
ON [media] ([size]);
CREATE INDEX [idx_media_duration]
ON [media] ([duration]);
CREATE INDEX [idx_media_time_created]
ON [media] ([time_created]);
CREATE INDEX [idx_media_time_modified]
ON [media] ([time_modified]);
CREATE INDEX [idx_media_time_downloaded]
ON [media] ([time_downloaded]);
CREATE INDEX [idx_media_fps]
ON [media] ([fps]);
CREATE INDEX [idx_media_view_count]
ON [media] ([view_count]);
CREATE INDEX [idx_media_uploader]
ON [media] ([uploader]);
CREATE UNIQUE INDEX [idx_media_path]
ON [media] ([path]);
CREATE VIRTUAL TABLE [playlists_fts] USING FTS5 (
[path], [title],
tokenize='trigram',
content=[playlists]
);
CREATE TRIGGER [playlists_ai] AFTER INSERT ON [playlists] BEGIN
INSERT INTO [playlists_fts] (rowid, [path], [title]) VALUES (new.rowid, new.[path], new.[title]);
END;
CREATE TRIGGER [playlists_ad] AFTER DELETE ON [playlists] BEGIN
INSERT INTO [playlists_fts] ([playlists_fts], rowid, [path], [title]) VALUES('delete', old.rowid, old.[path], old.[title]);
END;
CREATE TRIGGER [playlists_au] AFTER UPDATE ON [playlists] BEGIN
INSERT INTO [playlists_fts] ([playlists_fts], rowid, [path], [title]) VALUES('delete', old.rowid, old.[path], old.[title]);
INSERT INTO [playlists_fts] (rowid, [path], [title]) VALUES (new.rowid, new.[path], new.[title]);
END;
CREATE VIRTUAL TABLE [media_fts] USING FTS5 (
[path], [webpath], [title],
tokenize='trigram',
content=[media]
);
CREATE TRIGGER [media_ai] AFTER INSERT ON [media] BEGIN
INSERT INTO [media_fts] (rowid, [path], [webpath], [title]) VALUES (new.rowid, new.[path], new.[webpath], new.[title]);
END;
CREATE TRIGGER [media_ad] AFTER DELETE ON [media] BEGIN
INSERT INTO [media_fts] ([media_fts], rowid, [path], [webpath], [title]) VALUES('delete', old.rowid, old.[path], old.[webpath], old.[title]);
END;
CREATE TRIGGER [media_au] AFTER UPDATE ON [media] BEGIN
INSERT INTO [media_fts] ([media_fts], rowid, [path], [webpath], [title]) VALUES('delete', old.rowid, old.[path], old.[webpath], old.[title]);
INSERT INTO [media_fts] (rowid, [path], [webpath], [title]) VALUES (new.rowid, new.[path], new.[webpath], new.[title]);
END;
CREATE TABLE [captions] (
[media_id] INTEGER,
[time] INTEGER,
[text] TEXT
);
ANALYZE sqlite_schema;
INSERT INTO sqlite_stat1 VALUES('media_fts_docsize',NULL,'5');
INSERT INTO sqlite_stat1 VALUES('media_fts_idx','media_fts_idx','1 1 1');
INSERT INTO sqlite_stat1 VALUES('media_fts_config','media_fts_config','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_extractor_key','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_path','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_profile','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_extractor_config','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_uploader','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_hours_update_delay','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_time_created','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_time_deleted','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_time_modified','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists','idx_playlists_id','1 1');
INSERT INTO sqlite_stat1 VALUES('playlists_fts_data',NULL,'3');
INSERT INTO sqlite_stat1 VALUES('playlists_fts_idx','playlists_fts_idx','1 1 1');
INSERT INTO sqlite_stat1 VALUES('playlists_fts_config','playlists_fts_config','1 1');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_path','5 1');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_uploader','5 3');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_view_count','5 2');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_fps','5 3');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_time_downloaded','5 3');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_time_modified','5 3');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_time_created','5 3');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_duration','5 2');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_size','5 3');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_playlists_id','5 3');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_time_deleted','5 5');
INSERT INTO sqlite_stat1 VALUES('media','idx_media_id','5 1');
INSERT INTO sqlite_stat1 VALUES('playlists_fts_docsize',NULL,'1');
INSERT INTO sqlite_stat1 VALUES('media_fts_data',NULL,'3');
ANALYZE sqlite_schema;
Schema currently proposed by this evolving PR, that would be generated with SQLAlchemy:
root@box:~# sqlite3 /library/calibre-web/xklb.db
SQLite version 3.45.1 2024-01-30 16:01:20
Enter ".help" for usage hints.
sqlite> .fullschema
CREATE TABLE media (
id INTEGER NOT NULL,
playlists_id INTEGER,
size INTEGER,
duration INTEGER,
time_created INTEGER,
time_modified INTEGER,
time_deleted INTEGER,
time_downloaded INTEGER,
fps INTEGER,
view_count INTEGER,
path VARCHAR,
webpath VARCHAR,
extractor_id VARCHAR,
title VARCHAR,
uploader VARCHAR,
live_status VARCHAR,
error VARCHAR,
time_uploaded INTEGER,
width INTEGER,
height INTEGER,
type VARCHAR,
video_codecs VARCHAR,
audio_codecs VARCHAR,
subtitle_codecs VARCHAR,
video_count INTEGER,
audio_count INTEGER,
language VARCHAR,
subtitle_count INTEGER,
download_attempts INTEGER,
PRIMARY KEY (id)
);
CREATE TABLE playlists (
id INTEGER NOT NULL,
time_modified INTEGER,
time_deleted INTEGER,
time_created INTEGER,
hours_update_delay INTEGER,
path VARCHAR,
extractor_key VARCHAR,
profile VARCHAR,
extractor_config VARCHAR,
extractor_playlist_id VARCHAR,
title VARCHAR,
uploader VARCHAR,
PRIMARY KEY (id)
);
CREATE TABLE captions (
id INTEGER NOT NULL,
media_id INTEGER,
time INTEGER,
text TEXT,
PRIMARY KEY (id),
FOREIGN KEY(media_id) REFERENCES media (id)
);
CREATE TABLE book_media_map (
book_id INTEGER NOT NULL,
media_id INTEGER,
PRIMARY KEY (book_id),
FOREIGN KEY(media_id) REFERENCES media (id)
);
CREATE TABLE history (
id INTEGER NOT NULL,
media_id INTEGER,
time_played INTEGER,
done BOOLEAN,
PRIMARY KEY (id),
FOREIGN KEY(media_id) REFERENCES media (id)
);
/* No STAT tables available */
"Official" xklb-metadata.db schema, as derived from an IIAB Calibre-Web example, after running
lb tubeadd
andlb dl
: (in other words, after having downloaded a video!)root@box:~# sqlite3 /library/calibre-web/xklb-metadata.db SQLite version 3.45.1 2024-01-30 16:01:20 Enter ".help" for usage hints. sqlite> .fullschema
Likewise a 333-line actual .dump
, also thanks to @deldesir — click below to expand this collapsible section:
@codewiz mentions that he appreciates the "declarative style" of this PR, so that de facto schema(s) are more understandable to all 👍
Seeing the schema like this does make it much easier to go through. Thanks @deldesir!
Couple of questions:
Couple of notes from today's 9/20 call:
book_media_map
table and augmenting the captions
table with a primary key and a foreign key to reference media_id but most of the schema is from the upstream xklb. Very helpful to know!book_media_map
can be prefixed with iiab_
to make clear this is a new addition (@holta mentioned that @codewiz suggested this). book_media_map
table and how do you intend to use it / for what use cases? book_media_map
table is trying to map the calibredb db and xklb but there's no FK constraint on book_id. Garbage collection may be okay in certain cases, but it may be helpful to leverage RDBMS properties (such as PK/FK constraints) where possible to keep data consistent. It is possible to have a PK and FK on the same db column if needed. @deldesir noted that part of the challenge is the calibredb and xklb are in two different physical databases meaning it’s non-trivial to refer to both of them from the same application.
- book_media_map: If book_id is a primary key in this table, is this really a table for books, not a mapping table?
Technically there is a table for books, but it's in another db (metadata.db). The book_id (books.id) is read from https://github.com/iiab/calibre-web/blob/2f7ae47e118519c3ed25a8320d1445f403e76057/cps/editbooks.py#L265 and inserted into book_media_map.
- If though, the book_media_map table is supposed to link to the calibredb table in some way: https://manual.calibre-ebook.com/generated/en/calibredb.html, should book_id be a foreign key in the map table?
We cannot link two databases by joining there tables columns here, book_media_map was done out of the desire to keep a record of book_id with relatioship with media_id from xklb.db.
- media: should playlists_id be a FK to the playlists table?
Since the goal here is to understand and replicate the official schema in order to use it in a declarative form, this needs to be done upstream by the maintainer in relationship with the overall structure of xklb.
I guess this is set as it is as a denormalization strategy necessary for xklb functionality around playlists...
- Can a media object belong to more than one playlists? If so, then it may make be helpful for the schema to have a media_playlist_map instead of storing the playlists_id in the media table.
This would introduce redundancy since all playlists are computed to populate bookshelves stored in app.db.
- Is there a notion of user in Calibre-Web? If so, you may want to house user_id in the history table if you need to be able to distinguish user histories.
The history table might have been generated by using xklb search
feature. It's xklb related. Calibre-Web user settings are stored in app.db.
This would introduce redundancy since all playlists are computed to populate bookshelves stored in app.db.
I don't understand the reasoning here. Right now, if a media object is in more than one playlist, there will be multiple rows for the same media object in the media table....
This would introduce redundancy since all playlists are computed to populate bookshelves stored in app.db.
I don't understand the reasoning here. Right now, if a media object is in more than one playlist, there will be multiple rows for the same media object in the media table....
Great question but can we arrange a bookshelf/playlist design call in coming weeks, to solve these larger questions? e.g. to begin working thru the many tensions between the config/app.db
view of bookshelves and the xklb-metadata.db
view of playlists!
Short term, I'd ask that we keep focus on individual videos, to solve the actual problem at hand without getting carried away. In other words, first and foremost we should help @deldesir to debug this PR:
Working thru the most immediate problems articulated here:
And if on the side that forces us to submit a draft proposal as to how portable bookshelves or portable playlists should in fact / eventually work, that's A-OK as an optional bonus, as an early blueprint for future work. 💯
+1 on focusing on videos. playlists can definitely come later!
This pull request introduces a new module,
xb.py
, which proposes a standardized schema for thexklb-metadata.db
database. The objective of this initial experiment is to:xklb-metadata.db
to standardize data handling and storage.xb.py
to facilitate interaction withxklb-metadata.db
.metadata.db
andapp.db
).This update lays the groundwork for future enhancements, aiming to improve data integrity and streamline a better integration of xklb-metadata.db within the Calibre-Web ecosystem. Context for this change can be found in issue #246.