Open shermp opened 5 years ago
There's rather a lot of stuff happening behind the scenes during the import process, AFAICT (you can peek at it via the sql debug logs), so, why not, but that'd be a rather imposing task, with the risk of severely screwing things up if it goes wrong ;).
I also have no idea how to get Nickel to actually show the new content without an USBMS session (or an sdcard scan)...
I'd still do stuff in USBMS mode, such that hopefully Nickel will sync with the DB when it has left USBMS mode.
I did a quick sqldiff on a DB pre sideload and post sideload, to see what it was potentially creating. Seemed to add two records to the content
table, one to the AnalyticsEvents
table, and one to the volume_shortcovers
table.
Would definitely have to do some more investigation what Nickel does SQL wise, to see how difficult it is.
Thankfully, I now have a spare H2O to test with, without messing up my main reading device :)
EDIT: And the AnalyticsEvents
appears to be something to do with the USBMS itself, so may not need to worry about that
Oh, right ;).
You should be able to forget the Analytics stuff, obviously ;).
But what's mainly worrying me is the crapload of stuff done to content
to handle chapters or whatever ;).
(Which I thought might have been KePub only, but recent Drobox experiments showed that it happens with ePubs, too).
When you first sideload a book, it only seems to add one "chapter" record, at least for the test book. I suspect the rest might be added upon first opening, although I would have to test more books to be sure.
EDIT: My test book added a ....#(0)
record.
Aww crap. Tested with a different book, and yeah, Nickel does add the chapters to the content table :(
EDIT: And it's not as simple as Nickel adding all the spine
elements from the opf as-is either :(
Yes, I've been looking at this every now and then, and have successfully imported a few kepubs manually (I haven't tried with plain epubs or any other format, and my kepubs are all manually fixed to be well-formed). I haven't written an automated tool for this, and if I do, I'll probably look into how libnickel does it. Firmware compatibility isn't too bad, though.
@davidfor might have some comments about this.
Hmm... Looks like it might be using the ncx
TOC file to generate the records. With a (well formed) test ebook, that's what corresponds with what Nickel added into the DB.
For a each book, there is a row for the book added to the content table. Then there are rows for the ToC entries or internal files depending on the format.
For epubs and PDF, for each entry in the ToC, there is a row in the content and content_shortcover tables.
For kepubs, there is a row for each spine entry in both content and content_shortcover. Then there is another row in content for each ToC entry.
I've looked at this a few times, and just don't think it is worth the hassle. And even if you do it, it is going to take time and use battery. To do this, you need to extract the appropriate file from the book, process it an insert rows into the database. It could probably be more efficient as Kobo uses the Adobe RMSDK to handle epubs and PDFs. Kepubs must be their own code.
I've looked at doing this from calibre when sending a book, and I just don't think the advantage is worth the effort or risk. I can only see to advantages. One is avoiding the import, but, that just changes when it is done. The other is being able to set the series info during the send. I don't see that either are worth it.
Having said that, I do have code for updating the epub ToC when replacing a book. When I have time, I'll be cleaning that up and adding kepub support. When I do that, it will be added to my Kobo Utilities plugin, not the driver.
Thanks for the input @davidfor
Regarding the battery, it's the whole "poll the screen over and over to try and detect the completion of importing", then re-entering USBMS a second time, remounting partitions, updating DB, unmounting partition etc. that would be great to avoid.
Also, time. The above rigmarole takes quite a while to complete. I would love it if I could decrease that time.
To be honest though, it's probably the WiFi that's the biggest battery hog. Not a lot that can be done about that.
Oh, and just so you know, the fastest time I've been able to scan an EPUB file in Go on the Kobo has been around 1.3s each (using goroutines for concurrency, not validating paths, and reading the zip in memory). You might be able to get it down to 1s with a faster XML parser, but not much faster than that. This includes the time it takes to write a batched insert to the db.
Oh, and just so you know, the fastest time I've been able to scan an EPUB file in Go on the Kobo has been around 1.3s each (using goroutines for concurrency, not validating paths, and reading the zip in memory). You might be able to get it down to 1s with a faster XML parser, but not much faster than that. This includes the time it takes to write a batched insert to the db.
To be fair, Nickel's import process doesn't appear to be much quicker, if at all.
Just had another, different idea. It's a variation on what what I've thought about previously involving SQL triggers, @davidfor may wish to look away now, although the new idea is a bit more ... refined. Still a dirty hack though,
How about extending the DB schema to add a new table like ku_metadata_update
. The table could have a contentid
column, and a set of metadata columns. Adding a book adds a record to this table. Then, define an AFTER INSERT
trigger that updates the book record after Nickel adds it to the DB, and deletes the corresponding row in the ku_metadata_update
table.
Assuming Nickel isn't bothered by an extra unknown table of course...
EDIT: This assumes sqlite is flexible enough to do this or course...
@shermp, that's actually a great idea! I'll play around with it sometime this week. I'll probably make an experimental version of seriesmeta which uses this trick.
Here's a quick SQL thing I put together just now. I've done some manual testing with it, but I haven't tried with an actual import yet:
CREATE TABLE IF NOT EXISTS _seriesmeta (
ImageId TEXT NOT NULL UNIQUE,
Series TEXT,
SeriesNumber INTEGER,
PRIMARY KEY(ImageId)
);
DROP TRIGGER IF EXISTS _seriesmeta_insert;
DROP TRIGGER IF EXISTS _seriesmeta_update;
DROP TRIGGER IF EXISTS _seriesmeta_delete;
CREATE TRIGGER _seriesmeta_insert
AFTER INSERT ON content WHEN
/*(new.Series IS NULL) AND*/
(new.ImageId LIKE "file____mnt_onboard_%") AND
(SELECT count() FROM _seriesmeta WHERE ImageId = new.ImageId)
BEGIN
UPDATE content
SET
Series = (SELECT Series FROM _seriesmeta WHERE ImageId = new.ImageId),
SeriesNumber = (SELECT SeriesNumber FROM _seriesmeta WHERE ImageId = new.ImageId)
WHERE ImageId = new.ImageId;
/*DELETE FROM _seriesmeta WHERE ImageId = new.ImageId;*/
END;
CREATE TRIGGER _seriesmeta_update
AFTER UPDATE ON content WHEN
/*(new.Series IS NULL) AND*/
(new.ImageId LIKE "file____mnt_onboard_%") AND
(SELECT count() FROM _seriesmeta WHERE ImageId = new.ImageId)
BEGIN
UPDATE content
SET
Series = (SELECT Series FROM _seriesmeta WHERE ImageId = new.ImageId),
SeriesNumber = (SELECT SeriesNumber FROM _seriesmeta WHERE ImageId = new.ImageId)
WHERE ImageId = new.ImageId;
/*DELETE FROM _seriesmeta WHERE ImageId = new.ImageId;*/
END;
CREATE TRIGGER _seriesmeta_delete
AFTER DELETE ON content
BEGIN
DELETE FROM _seriesmeta WHERE ImageId = old.ImageId;
END;
Here's a better version which puts the metadata directly into the content table if already imported:
CREATE TABLE IF NOT EXISTS _seriesmeta (
ImageId TEXT NOT NULL UNIQUE,
Series TEXT,
SeriesNumber TEXT,
PRIMARY KEY(ImageId)
);
/* Adding series metadata on import */
DROP TRIGGER IF EXISTS _seriesmeta_content_insert;
CREATE TRIGGER _seriesmeta_content_insert
AFTER INSERT ON content WHEN
/*(new.Series IS NULL) AND*/
(new.ImageId LIKE "file____mnt_onboard_%") AND
(SELECT count() FROM _seriesmeta WHERE ImageId = new.ImageId)
BEGIN
UPDATE content
SET
Series = (SELECT Series FROM _seriesmeta WHERE ImageId = new.ImageId),
SeriesNumber = (SELECT SeriesNumber FROM _seriesmeta WHERE ImageId = new.ImageId)
WHERE ImageId = new.ImageId;
/*DELETE FROM _seriesmeta WHERE ImageId = new.ImageId;*/
END;
DROP TRIGGER IF EXISTS _seriesmeta_content_update;
CREATE TRIGGER _seriesmeta_content_update
AFTER UPDATE ON content WHEN
/*(new.Series IS NULL) AND*/
(new.ImageId LIKE "file____mnt_onboard_%") AND
(SELECT count() FROM _seriesmeta WHERE ImageId = new.ImageId)
BEGIN
UPDATE content
SET
Series = (SELECT Series FROM _seriesmeta WHERE ImageId = new.ImageId),
SeriesNumber = (SELECT SeriesNumber FROM _seriesmeta WHERE ImageId = new.ImageId)
WHERE ImageId = new.ImageId;
/*DELETE FROM _seriesmeta WHERE ImageId = new.ImageId;*/
END;
DROP TRIGGER IF EXISTS _seriesmeta_content_delete;
CREATE TRIGGER _seriesmeta_content_delete
AFTER DELETE ON content
BEGIN
DELETE FROM _seriesmeta WHERE ImageId = old.ImageId;
END;
/* Adding series metadata directly when already imported */
DROP TRIGGER IF EXISTS _seriesmeta_seriesmeta_insert;
CREATE TRIGGER _seriesmeta_seriesmeta_insert
AFTER INSERT ON _seriesmeta WHEN
(SELECT count() FROM content WHERE ImageId = new.ImageId)
/*AND ((SELECT Series FROM content WHERE ImageId = new.ImageId) IS NULL)*/
BEGIN
UPDATE content
SET
Series = new.Series,
SeriesNumber = new.SeriesNumber
WHERE ImageId = new.ImageId;
/*DELETE FROM _seriesmeta WHERE ImageId = new.ImageId;*/
END;
DROP TRIGGER IF EXISTS _seriesmeta_seriesmeta_update;
CREATE TRIGGER _seriesmeta_seriesmeta_update
AFTER UPDATE ON _seriesmeta WHEN
(SELECT count() FROM content WHERE ImageId = new.ImageId)
/*AND ((SELECT Series FROM content WHERE ImageId = new.ImageId) IS NULL)*/
BEGIN
UPDATE content
SET
Series = new.Series,
SeriesNumber = new.SeriesNumber
WHERE ImageId = new.ImageId;
/*DELETE FROM _seriesmeta WHERE ImageId = new.ImageId;*/
END;
You can uncomment the first comment in each trigger to not replace existing metadata, and uncomment the last one to only update the metadata once.
I was going to look into this myself, but if someone else is willing to do all the hard (cough SQL cough) stuff...
If it's an idea you think we should pursue further, perhaps we could spin it off as a standalone specification that could be used by other applications.
If it's an idea you think we should pursue further, perhaps we could spin it off as a standalone specification that could be used by other applications.
I do think it is. I'm already working on adding this to seriesmeta, and I'll test it on my Kobo itself tomorrow. The biggest thing is there will need to be a way to prevent conflicts between multiple applications using this (which can be partly solved by uncommenting the lines I commented).
It might be enough to have a well defined schema that should be adhered to. And yeah, I think deleting the row once the update has been made might be a good idea.
Oops, wrong issue, see https://github.com/geek1011/kepubify/pull/43#issuecomment-538651599.
KU currently follows the conservative Calibre approach of letting Nickel import books into the DB, and then modifying metadata afterwards. This approach, while it works, is time consuming, and a battery hog.
I'm currently mulling over the idea of creating the book record(s) myself, bypassing Nickel's "Importing Content" stage. This would have the big advantage of doing everything in one step, removing a partition mount/dismount cycle, removing the need to use FBInk to attempt to detect the end of the import process etc.
The main downside would be the potential to "stuff it up", and would probably introduce more firmware compatibility constraints.
@NiLuJe and @geek1011, do you have any comments and/or potential concerns over taking this approach? I would probably add it as an alternative path, rather than a replacement.