koreader / crengine

This is the KOReader CREngine fork. It cross-pollinates with the official CoolReader repository at https://github.com/buggins/coolreader, in case you were looking for that one.
72 stars 45 forks source link

Some notes about thornyreader crengine fork #267

Closed poire-z closed 4 years ago

poire-z commented 5 years ago

New active fork of crengine found by @pazos at https://github.com/koreader/koreader/issues/4745#issuecomment-470086270:

https://github.com/irrationalunzen/thornyreader thornyreader

Spent the evening reading git log -p... so sharing some observations:

Some noticable stuff they added/changed (in bold, the things that could be of interest to us): (the commit links I linked to is usually the most interesting one in the set - but the features come each with many many commits and later fixes).

(so, in bold, the things that could be of interest to us - even if I'm not particularly interested by any of them :) but I'll surely be reading that RTL stuff - just to see how easy or risky it would be in regards to all the LTR text fixes we added).

Frenzie commented 5 years ago

That was fast. :-)

smart txt detection (?)

Not sure what you mean by that but also see our #39.

Probably not usable as is - and there are lots of arabic char codes, so they do the positionning/sizing/rendering manually - while we could probably have harfbuzz do some of that, as it already renders/draw individual words in rtl. But it's definitely something to read to see how we could start to go at that.

Yeah, plus HarfBuzz does it for Persian, etc., etc.

poire-z commented 5 years ago

smart txt detection (?)

Not sure, not my words :) CONFIG_CRENGINE_TXT_SMART_FORMAT and DOC_FLAG_TXT_NO_SMART_FORMAT were added in their first commit, and possibly disappeared later. Stuff about txt content parsing, checking more stuff like nb chars per line, some kind of reflowing, wrapping to 80 columns...

Didn't mention it above, but they did indeed many updates/fixes to the little document formats, like mobi (they included another library to do that), txt, rtf, docx... Which might explain why they didn't care much about the more complex CSS/HTML renderings like tables, etc...

Frenzie commented 5 years ago

@poire-z The repo seems to have vanished. O_o Do you still have a local copy?

xelxebar commented 5 years ago

FWIW, it looks like the author's GitHub account is still active. I just tried sending an email, requesting temporary access to the repo.

Frenzie commented 5 years ago

@xelxebar Besides giving some indications as to which part of the code to look at it may not have been super helpful for complex rendering changes like bidi or vertical judging by @poire-z's analysis above, but it seemed nice to have a relevant example in any case.

poire-z commented 5 years ago

Yes, I do have a local copy. I dunno what I'm allowed to do with it... If the original author has removed it, it would feel awkward to make it available again. I could share here as attachment the 2 or 3 files related to RTL (they would probably miss some stuff in other files - but that code should really not be used as is anyway).

link2xt commented 5 years ago

Here is another copy of thornyreader: https://github.com/readera/thornyreader It is used in the ReadEra EReader, see https://readera.ru/en/reader-book or https://readera.ru/reader-book for direct APK links.

@poire-z You are allowed to make it available again, since Cool Reader is released under the GPL license. By the way, thornyreader has license problems, LICENSE files contains GNU AGPL which is not compatible with the GPL license of Cool Reader. And the source code of ReadEra is not published.

link2xt commented 5 years ago

@poire-z You may want to write the author first anyway, of course, maybe he has reasons to try to remove this repository. I guess it was just moved to readera organization without announcement.

poire-z commented 5 years ago

Here is another copy of thornyreader: https://github.com/readera/thornyreader

Thanks for finding that one. But it's actually lagging by 669 commits from the repo I cloned some months ago. But that's good, as I guess I can share a patch file with the 516 commits involving the crengine/ subdirectory, that the really interested person can apply on their clone of that readera/thornyreader repo: thornyreader_crengine_after_20180207_af19707.patch.zip (Made with git format-patch --stdout af19707 -- crengine if that matters)

xelxebar commented 5 years ago

FWIW, I did contact the original author and got a reply. Apparently, they were developing thornyreader for their employer and got strong-armed into taking the repo down. It seems like they personally would like the repo to be available, but it's unclear to me whether the patchset is legally usable. The GPL only guarantees source availability once the binaries have been distributed, no?

link2xt commented 5 years ago

@xelxebar

The GPL only guarantees source availability once the binaries have been distributed, no?

Yes. But the binaries are distributed both on https://readera.ru/reader-book and https://play.google.com/store/apps/details?id=org.readera

ReadEra uses Cool Reader based engine, which is licensed under GPL. GPL does not require authors to publish the source, but any user can demand the source code from them. And since binaries are freely distributed, you can easily become the user.

EDIT: I would not contact the ReadEra with these demands, though. Better contact the authors directly to avoid any consequences for them (though not following the GPL is obviously their employer's fault).

link2xt commented 5 years ago

It seems like they personally would like the repo to be available, but it's unclear to me whether the patchset is legally usable.

It may be not legally usable in one corner case: if the authors took some code that was licensed under the GPL-incompatible license and used it in their GPL software fork. That would be illegal in the first place so the GPL cannot make this combined codebase legal even if the users were not aware of the origin of the code.

poire-z commented 5 years ago

As I mentionned in the first post, there is really not much we'd want to borrow, or even could borrow (because their use of crengine looked so different than ours). So, no real need to get answers to these legal questions. The patches are just something one can have a look at to see how these features have been done on this other planet, like we would probably look at some other software written in a different language - some kind of tourism. (The only thing that felt like something we could possibly pick is the DOCX support - but I don't think there's that much interest in having it.)

poire-z commented 4 years ago

Now that we got DocX support from upstream, and our own RTL/Bidi support, nothing much to pick from that fork. Just cut and pasting the "Epub3 series metadata support" logic - in case we some day need to look at these other fields and properties:

//epub3 series metadata
if(series.empty()) {
    struct titleitem { lString16 title; lString16 id; };
    LVArray<titleitem> titles;
    for (int i = 1; i < 20; i++) {
        lString16 xpath = lString16("package/metadata/title[") << fmt::decimal(i) << "]";
        ldomNode *item = dom->nodeFromXPath(xpath);
        if (!item)
            break;
        titleitem curr;
        curr.id = item->getAttributeValue("id");
        curr.title = item->getText().trim();
        titles.add(curr);
    }
    lString16 series_id;
    for (int i = 1; i < 20; i++) {
        lString16 xpath = lString16("package/metadata/meta[") << fmt::decimal(i) << "]";
        ldomNode *item = dom->nodeFromXPath(xpath);
        if (!item)
            break;
        lString16 property = item->getAttributeValue("property");
        lString16 content = item->getText().trim();
        if (property == "collection-type" && content == "series") {
            series_id = item->getAttributeValue("refines");
            if (series_id.startsWith("#"))
                series_id = series_id.substr(1, series_id.length() - 1);
            break;
        }
    }
    if (!series_id.empty()) {
        for (int i = 0; i < titles.length(); i++) {
            titleitem curr = titles.get(i);
            if (curr.id == series_id) {
                series = curr.title;
                break;
            }
        }
        for (int i = 1; i < 20; i++) {
            lString16 xpath = lString16("package/metadata/meta[") << fmt::decimal(i) << "]";
            ldomNode *item = dom->nodeFromXPath(xpath);
            if (!item)
                break;
            lString16 refines = item->getAttributeValue("refines");
            lString16 property = item->getAttributeValue("property");
            lString16 content = item->getText().trim();
            if (refines == series_id && property == "group-positon") {
                series_number = content.atoi();
                break;
            }
        }
    }
}
//if series still empty: last attempt, for property="belongs-to-collection"
if(series.empty()) {
    lString16 series_id;
    for (int i = 1; i < 20; i++) {
        lString16 xpath = lString16("package/metadata/meta[") << fmt::decimal(i) << "]";
        ldomNode *item = dom->nodeFromXPath(xpath);
        if (!item)
            break;
        lString16 property = item->getAttributeValue("property");
        if (property == "belongs-to-collection") {
            series = item->getText().trim();
            series_id = item->getAttributeValue("id");
            break;
        }
    }
    if(!series_id.empty() && series_number == 0) {
        for (int i = 1; i < 20; i++) {
            lString16 xpath = lString16("package/metadata/meta[") << fmt::decimal(i) << "]";
            ldomNode *item = dom->nodeFromXPath(xpath);
            lString16 property = item->getAttributeValue("property");
            lString16 id = item->getAttributeValue("refines");
            if (id.startsWith("#"))
                id = id.substr(1, id.length() - 1);
            if (id == series_id && property == "group-position") {
                series_number = item->getText().trim().atoi();
                break;
            }
        }
    }
}
poire-z commented 4 years ago

All the interesting stuff in bold has since been implemented via some original work. So nothing more to look at for consideration in this no more available fork.