mduplouy / silex-cops

A port of COPS (Calibre OPDS PHP Server) under Silex micro framework
24 stars 9 forks source link

Adding MyRuLib database to silex-cops? #34

Open alllexx88 opened 7 years ago

alllexx88 commented 7 years ago

Hi! I have a huge djvu/pdf/ps/chm/txt.bz2 library (62k+ books) with a MyRuLib database. Is such a large library managable for silex-cops? At least in theory? If so, which calibre metadata.db tables and fields are needed for silex-cops? Here's MyRuLib database schema:

CREATE TABLE authors(
        id integer primary key autoincrement not null,
        letter char(1),
        search_name varchar(255),
        full_name varchar(255),
        first_name varchar(128),
        middle_name varchar(128),
        last_name varchar(128),
        number integer,
        newid integer,
        description text);
CREATE TABLE sqlite_sequence(name,seq);
CREATE TABLE books(
      id integer not null,
      id_author integer,
      title varchar(255) not null,
      annotation text,
      genres text,
      id_sequence integer,
      deleted boolean,
      id_archive integer,
      file_name text,
      file_size integer,
      file_type varchar(20),
      md5sum char(32),
      created integer,
      lang char(2),
      year integer,
      description text,
      file_path text,
      PRIMARY KEY(id,id_author)
      );
CREATE TABLE archives(
      id integer primary key autoincrement not null,
      file_name varchar(255),
      file_path varchar(255),
      file_size integer,
      file_count integer);
CREATE TABLE dates(id integer primary key autoincrement not null, lib_min integer, lib_max integer, lib_num, usr_min integer, usr_max, usr_num integer);
CREATE TABLE sequences(id integer primary key autoincrement not null, number integer, value varchar(255) not null);
CREATE TABLE bookseq(id_book integer, id_seq integer, number integer, level integer, id_author integer, PRIMARY KEY(id_book,id_seq));
CREATE TABLE files(id_book integer, id_archive integer, file_name TEXT, file_path TEXT);
CREATE TABLE params(id integer primary key, value integer, text text);
CREATE TABLE dir0(code INTEGER PRIMARY KEY autoincrement not null, name, info, parent INTEGER NOT NULL);
CREATE TABLE ref0(code INTEGER, book INTEGER, PRIMARY KEY(code, book));
CREATE TABLE dir1(code INTEGER PRIMARY KEY autoincrement not null, name, info, parent INTEGER NOT NULL);
CREATE TABLE ref1(code INTEGER, book INTEGER, PRIMARY KEY(code, book));
CREATE TABLE tables(id INTEGER PRIMARY KEY autoincrement not null, title,
        dir_file,dir_data,dir_type,dir_code,dir_name,dir_info,dir_prnt,
        ref_file,ref_data,ref_type,ref_code,ref_book,fb2_code);
CREATE TABLE types(file_type varchar(99) PRIMARY KEY,command,convert);
CREATE INDEX author_letter ON authors(letter);
CREATE INDEX author_name ON authors(search_name);
CREATE INDEX book_author ON books(id_author);
CREATE INDEX book_archive ON books(id_archive);
CREATE INDEX book_md5sum ON books(md5sum);
CREATE INDEX book_created ON books(created);
CREATE INDEX files_book ON files(id_book);
CREATE INDEX sequences_name ON sequences(value);
CREATE INDEX bookseq_seq ON bookseq(id_seq);
CREATE INDEX dir0_parent ON dir0(parent);
CREATE INDEX ref0_book ON ref0(book);
CREATE VIRTUAL TABLE fts_book USING fts3(content,dscr,tokenize=porter);
CREATE TABLE 'fts_book_content'(docid INTEGER PRIMARY KEY, 'c0content', 'c1dscr');
CREATE TABLE 'fts_book_segments'(blockid INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE 'fts_book_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx));
CREATE VIRTUAL TABLE fts_auth USING fts3(content,tokenize=porter);
CREATE TABLE 'fts_auth_content'(docid INTEGER PRIMARY KEY, 'c0content');
CREATE TABLE 'fts_auth_segments'(blockid INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE 'fts_auth_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx));
CREATE VIRTUAL TABLE fts_seqn USING fts3(content,tokenize=porter);
CREATE TABLE 'fts_seqn_content'(docid INTEGER PRIMARY KEY, 'c0content');
CREATE TABLE 'fts_seqn_segments'(blockid INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE 'fts_seqn_segdir'(level INTEGER,idx INTEGER,start_block INTEGER,leaves_end_block INTEGER,end_block INTEGER,root BLOB,PRIMARY KEY(level, idx));
CREATE TABLE genres(id_book integer, id_genre CHAR(2));
CREATE INDEX genres_book ON genres(id_book);
CREATE INDEX genres_genre ON genres(id_genre);

I can share the database itself, but it weights ~16.2MB when compressed with WinRAR.

Also, if adding myrulib database format support to silex-cops directly isn't a hassle, it'd be great, since further library updates are likely to come with upgraded myrulib databases.

Thanks!

mduplouy commented 7 years ago

Hi,

Woo, impressive library !

I don't see any problem to handle a huge amount of books, even if with 60K+ a "real" database engine should be better (like mysql / postgresql)

As silex-cops is a frontend for Calibre, all object mapping are related to the database schema.

In order to make it compatible with myrulib one should modify a pretty big amount of data in the entity & repository classes but it is not impossible :)

alllexx88 commented 7 years ago

As silex-cops is a frontend for Calibre, all object mapping are related to the database schema.

Thanks for making this clear 😃 I don't have much experience with php (though its syntax is very close to C), and even though I'm an apt learner, and used to dealing with new things, I just don't have the time now to dive deeper here to understand silex-cops internals 😞 It's much easier for me to write a script (bash, or python is better here, I think) to populate an empty calibre database from myrulib database. It's doable, and not too hard.

... even if with 60K+ a "real" database engine should be better (like mysql / postgresql)

As a feature request (sorry for the offtopic), it'd be good if silex-cops could handle mysql and postgresql. I mean, making a copy of metadata.db to a 'real' database, and syncing it on metadata.db (timestamp) change.

I really like silex-cops design, and, in general, I think it's better to not restrict it to being a Calibre frontend only. But there's always room for improvement 😄

Thanks 👍

mduplouy commented 7 years ago

Actually, as I use Doctrine DBAL abstraction layer, it could be very easy to switch to mysql or postgresql as long as the table structures are strictly the same ;) If you manage to import your data, I can make a mysql branch for you

Btw there is no restriction to test current sqlite engine against a 60K+ book collection

alllexx88 commented 7 years ago

@mduplouy It turned out to not be trivial, since myrulib object mapping is rather different from calibre, but I've written a conversion script. Yet it takes a really long while to complete the conversion (62800 books to process). Good that I can make my Synology do this in a screen session for as long as it needs to 😄 I log SQL INSERT commands along the way too, to transfer to 'real' db easily if need be. I'll let you know how well silex-cops manages the library on Synology NAS when conversion is done 😃

mduplouy commented 7 years ago

Would ne nice if you coule provide a download link for the script so i can give it a try ans make some profiling :-)

alllexx88 commented 7 years ago

The problem's that it's pretty database-specific 😞 myrulib, unlike calibre, doesn't use many-to-many relations between authors and books -- it's one to many instead. So, in case a book has more than one author, it creates new comma-separated author, e.g. "Author A, Author B." -- and it's maddening, since the library I'm converting has quite a few author names with commas... I've put together a reg expression that seems to be right for my library, but it's not universal. It's the same with language codes, but at least they don't have commas (but I'll have to manually edit them later to convert to ISO 639-2 from ISO 639-1). Also, the script deduces hierarchical tags based on file paths (in a way that may only be applicable to this lib), and extracts ISBN from filenames. Here's the script, just remember that it has parts that're library-specific:

(I've commented it a bit, and edited to be more readable, but it's still pretty raw) (had to remove books and series create/update triggers from empty calibre database, cause I don't have title_sort() implemented)

https://gist.github.com/alllexx88/c4b2c1b96d2f6addbfffed4d763c05c6

P.S. My lib will take another ~72hrs to complete conversion... well, no surprise: a lot of string matching has to be done, even though sed is highly efficient in this

mduplouy commented 7 years ago

maybe a gist would do the trick for your script :)

actually I was talking about the SQL script, which I guess should not be so huge in bzip2 compressed format

Let me know when it's over, I'm quite curious about the result

alllexx88 commented 7 years ago

Sorry, you're right about gist 🆗

It's at ~46% now. I sure will let you know, I'm very interested in making it work well. Especially if you add mysql support, since then there's a big chance I can have it set up in my University building... P.S. To make colleagues and students happy 😆

P.P.S. Yeah, I'll upload the SQL script when it's done generating. It's actually INSERT statements only, since I use an empty Calibre database (with some removed triggers) as a starting point.

mduplouy commented 7 years ago

I'm pretty confident with the mysql handling, the only tricky part is the trigger within the database provider and the backoffice :+1: all the rest is pure sql

Nice for the script, so I can play it many times to regenerate a clean database

And new translations shouldn't be a problem too if needed :smile:

alllexx88 commented 7 years ago

Sounds great 👍 The great things about silex-cops are that it 1) needs so few dependencies (I can do what I want at home, but making admins at work install something "alien" to them is much more difficult), 2) code readability (given I have time to know php and silex good enough, I can mod it myself if need be), 3) I love your license :lol: 4) the design is both intuitive, and looks gorgeous... so as soon as the conversion is complete, and I have the time (going to have a hard long working day on Monday, when the conversion completes:( ), I'll definitely give it a go 😄

mduplouy commented 7 years ago

nice to hear, i'll make an issue for mysql / postgresql support and a dedicated branch

feel free to fork and do whatever the fuck you want to :+1: