joseph / Monocle

A silky, tactile browser-based ebook JavaScript library.
http://monocle.inventivelabs.com.au
MIT License
743 stars 200 forks source link

Loading Entire Books, Detecting "End of Chapter" #135

Closed arcgaden closed 11 years ago

arcgaden commented 11 years ago

Hello,

I am working on a significant project where we are loading entire books all at once via JSON. This works well with Monocle in general, and the loading process has been quite fast. Once an entire book loads, we build out our TOC, a Nav control, etc.

Our problem lies in certain styles of eBook construction while paginating/flipping through the book page by page.

When an eBook uses distinct HTML pages for every NavPoint (even when Nested), there is absolutely no issue.

When an eBook uses #FOO Hash Tagging following a URL for NavPoints (used for Nesting), so teh HTML Page is shared across the Nesting, we encounter an Odd behavior at the end of a Chapter/Section. We get stuck in a Loop of that Chapter.

An Example: I have come to the end of Chapter 2, on the last page it brings me back to the beginning of Chapter two.

Additionally, if I navigate to the beginning of Chapter 3, and page backwards once, the code properly builds the link to the last Page of Chapter 2. I can freely navigate to the beginning of the book, but if I goto the end of Chapter 3, I am brought back into the Loop discussed earlier. If the code working in when paginating backwards could apply to paginating forwards, then we'd be working great!

We tried borrowing the code directly from the Book.ish site, as well as that from this GitHub, but no recent changes apply to what we are doing here. We have determined the link building behavior as definitely resident to the client-side code in Monocle, but an exact way of handling this still seems unclear. Our assumption at this point is that Monocle was originally designed for work on a Chapter by Chapter basis, but if this can be fixed, it is entirely feasible as a whole-book-load solution, too.

Any thoughts on how this can be handled?

Best, Jeremy Streeter

rubemz commented 11 years ago

It may be happening because you may have mapped all your NavPoint entries to getComponents() in your book data object, including those URI with hash tags.

Based on the wiki: getComponents: returns an array of all the component ids that are to be accessed in linear reading order (ie, like the spine in an EPUB OPF file — you don't have to list every component, just the ones that are read in order).

If you have repeated components you will get stuck in a loop. Since a component URI with or without hash tag would lead you to load the same (x)html - EPub component. So, you should map the spine of the book to the getComponents.

Take a look inside the spine inside the *.opf file and map it to getComponents. It should look like this:

<spine toc="ncx">
    <itemref idref="titlepage" />
    <itemref idref="historico-obra.xhtml" />
    <itemref idref="ficha-catalografica.xhtml" />
    <itemref idref="Section0001.xhtml" />
</spine>
arcgaden commented 11 years ago

Thanks, while that didn't exactly fix my issue, it pointed me in a direction that was very helpful. We had a server side method (C#) that was both seeding Book content and component IDs. I altered it to differentiate between which we were seeding, and began stripping out any navPoints with a URI containing a Hash Tag. Which of course has its own issues when a publisher/author is not HTML savvy and adds a URI such as "Chap1.html#chap1", when not a descendant and at the top level. This kind of issue cripples the TOC, but at least the content remains available.

Anybody have a quick method for stripping # hash tags and following content from a tag? I tried a few different approaches but messing with inline Attribute text is a little tricky and doesn't behave consistently. At any rate, thanks again for the note on the components. This removed our looping issue, and a modicum of other minor behavoral issues related to the hashtags in nested navs.

Best, Jeremy Streeter

rubemz commented 11 years ago

You are welcome :D

On Thu, Sep 6, 2012 at 5:42 PM, Jeremy Streeter notifications@github.comwrote:

Thanks, while that didn't exactly fix my issue, it pointed me in a direction that was very helpful. We had a server side method (C#) that was both seeding Book content and component IDs. I altered it to differentiate between which we were seeding, and began stripping out any navPoints with a URI containing a Hash Tag. Which of course has its own issues when a publisher/author is not HTML savvy and adds a URI such as "Chap1.html#chap1", when not a descendant and at the top level. This kind of issue cripples the TOC, but at least the content remains available.

Anybody have a quick method for stripping # hash tags and following content from a tag? I tried a few different approaches but messing with inline Attribute text is a little tricky and doesn't behave consistently. At any rate, thanks again for the note on the components. This removed our looping issue, and a modicum of other minor behavoral issues related to the hashtags in nested navs.

Best, Jeremy Streeter

— Reply to this email directly or view it on GitHubhttps://github.com/joseph/Monocle/issues/135#issuecomment-8346762.

Rubem Nakamura

joseph commented 11 years ago

@arcgaden If you're ever modifying HTML content, I'd strongly recommend that you use an XML parser rather than regexes (for reasons that are convincingly laid out here). But it's a rule that Monocle breaks in a few places.

Still, in general, note that with JavaScript you have a built-in parser — the browser. Try hooking into the component:modify event and search the DOM of the component document for the URL pattern you need to change the behaviour of. This is exactly the way the Stencil works, so the Stencil tests might give you some clues.

Closing this issue as I think it's resolved now.