readium / swift-toolkit

A toolkit for ebooks, audiobooks and comics written in Swift
https://readium.org/mobile/
BSD 3-Clause "New" or "Revised" License
223 stars 96 forks source link

Current Selection Locator has incorrect data describing the text #363

Open streamg opened 7 months ago

streamg commented 7 months ago

Bug Report

In some epubs, when selecting a word in the text, the currectSelection.locator describes a Text element that has incorrect before or after values (missing new lines for example)

What happened?

Taking the following example: an epub chapter that starts like this

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ops="http://www.idpf.org/2007/ops">
<head>
<title>Three Tales of My Father&#x2019;s Dragon</title>
<link rel="stylesheet" type="text/css" href="Gann_9780307976482_epub_css_r1.css"/>
<link rel="stylesheet" type="application/vnd.adobe-page-template+xml" href="page-template.xpgt"/>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
</head>
<body>
<h1 class="chapter" id="b3-c01"><a id="page155"></a><em>Chapter One</em><br/><br/>MY FATHER MEETS THE CAT</h1>
<p class="indent1"><span class="big1"><strong>O</strong></span>ne cold rainy day when my father was a little boy, he met an old alley cat on his street. The cat was very drippy and uncomfortable so my father said, &#x201C;Wouldn&#x2019;t you like to come home with me?&#x201D;</p>

Selecting a word, "rainy" in our case. The currentSelection.locator looks like this:

{\"href\":\"/OEBPS/Gann_9780307976482_epub_b3-c01_r1.htm\",\"locations\":{\"position\":13,\"progression\":0,\"totalProgression\":0.20689655172413793},\"text\":{\"after\":\" day when my father was a little boy, he met an old alley cat on his street. The cat was very drippy and uncomfortable so my father said, “Wouldn’t you like to come home with me?”\\nThis surprised the\",\"before\":\"Chapter OneMY FATHER MEETS THE CAT\\nOne cold \",\"highlight\":\"rainy\"},\"title\":\"1. My Father Meets the Cat\",\"type\":\"application/xhtml+xml\"}

Looking at the before parameter, it looks like some new line characters are missing Chapter OneMY FATHER MEETS THE CAT

Expected behavior

The currentSelection.locator should return the correct Text value

How to reproduce?

Open the following epub, go to chapter "1. My Father Meets the Cat" and select a word from the first paragraph. Check the currentSelection.locator.text value. 9780307976482_cewexm_preview.epub.zip

Environment

Development environment

macOS: 13.5.1 platform: arm64 carthage: Xcode 14.3 Build version 14E222b

Testing device

Additional context

mickael-menu commented 7 months ago

It looks like the WebView returns this for document.body.textContent:

"
Chapter OneMY FATHER MEETS THE CAT
One cold rainy day when my father was
...

But if I load the raw HTML content of the same chapter directly to the web view, this is what we get:

"

        Chapter One

        MY FATHER MEETS THE CAT

            O

        ne cold rainy day when my father was a little boy, he met an old alley cat on his street. The cat was very drippy and uncomfortable so my father said, “Wouldn’t you like to come home with me?”

    This surprised the cat—sh...

I didn't find what's causing this yet. I tried removing all our HTML and JavaScript injections and changing the media type (HTML instead of XHTML) returned by the HTTP server, to no avail.

mickael-menu commented 7 months ago

Ha, actually I used the pretty-printed version of the HTML content returned by the Web Inspector.

If I try with the actual HTML that is in the EPUB, I get the same result with document.body.textContent in the browser.

So to answer your issue, this is not a bug. The locator.text contains the characters as they are present in the DOM. What's drawn on the screen and any <br/> has no impact on that. As you can see in the HTML content below, there are no whitespaces in the sources.

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ops="http://www.idpf.org/2007/ops">
<head>
<title>Three Tales of My Father&#x2019;s Dragon</title>
<link rel="stylesheet" type="text/css" href="Gann_9780307976482_epub_css_r1.css"/>
<link rel="stylesheet" type="application/vnd.adobe-page-template+xml" href="page-template.xpgt"/>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
</head>
<body>
<h1 class="chapter" id="b3-c01"><a id="page155"></a><em>Chapter One</em><br/><br/>MY FATHER MEETS THE CAT</h1>
<p class="indent1"><span class="big1"><strong>O</strong></span>ne cold rainy day when my father was a little boy, he met an old alley cat on his street. The cat was very drippy and uncomfortable so my father said, &#x201C;Wouldn&#x2019;t you like to come home with me?&#x201D;</p>
<p class="indent">This surprised the cat&#x2014;she had never before met anyone who cared about old alley cats&#x2014;but she said, &#x201C;I&#x2019;d be very much obliged if I could sit by a warm furnace, and perhaps have a saucer of milk.&#x201D;</p>
<p class="indent">&#x201C;We have a very nice furnace to sit by,&#x201D; said my father, &#x201C;and I&#x2019;m sure my mother has an extra saucer of milk.&#x201D;</p>
<p class="indent">My father and the cat became good friends but my father&#x2019;s mother was very upset about the cat. She hated <a id="page156"></a>cats, particularly ugly old alley cats. &#x201C;Elmer Elevator,&#x201D; she said to my father, &#x201C;if you think I&#x2019;m going to give <a id="page157"></a>that cat a saucer of milk, you&#x2019;re very wrong. Once you start feeding stray alley cats you might as well expect to feed every stray in town, and I am <em>not</em> going to do it!&#x201D;</p>
<p class="center"><img src="images/Gann_9780307976482_epub_094_r1.jpg" alt=""/></p>
<p class="indent">This made my father very sad, and he apologized to the cat because his mother had been so rude. He told the cat to stay anyway, and that somehow he would bring her a saucer of milk each day. My father fed the cat for three weeks, but one day his mother found the cat&#x2019;s saucer in the cellar and she was extremely angry. She whipped my father and threw the cat out the door, but later on my father sneaked out and found the cat. Together they went for a walk in the park and tried to think of nice things to talk about. My father said, &#x201C;When I grow up I&#x2019;m going to have an airplane. Wouldn&#x2019;t it be wonderful to fly just anywhere you might think of!&#x201D;</p>
<p class="indent">&#x201C;Would you like to fly very, very much?&#x201D; asked the cat.</p>
<p class="indent">&#x201C;I certainly would. I&#x2019;d do anything if I could fly.&#x201D;</p>
<p class="center"><a id="page158"></a><img src="images/Gann_9780307976482_epub_095_r1.jpg" alt=""/></p>
<p class="indent">&#x201C;Well,&#x201D; said the cat, &#x201C;if you&#x2019;d really like to fly that much, I think I know of a sort of a way you might get <a id="page159"></a>to fly while you&#x2019;re still a little boy.&#x201D;</p>
<p class="indent">&#x201C;You mean you know where I could get an airplane?&#x201D;</p>
<p class="indent">&#x201C;Well, not exactly an airplane, but something even better. As you can see, I&#x2019;m an old cat now, but in my younger days I was quite a traveler. My traveling days are over but last spring I took just one more trip and sailed to the Island of Tangerina, stopping at the port of Cranberry. Well, it just so happened that I missed the boat, and while waiting for the next I thought I&#x2019;d look around a bit. I was particularly interested in a place called Wild Island, which we had passed on our way to Tangerina. Wild Island and Tangerina are joined together by a long string of rocks, but people never go to Wild Island because it&#x2019;s mostly jungle and inhabited by very wild animals. So I decided to go across the rocks and explore it for myself. It certainly is an interesting place, but I saw something there that made me want to weep.&#x201D;</p>
<p class="center"><a id="page160"></a><img src="images/Gann_9780307976482_epub_096_r1.jpg" alt=""/></p>
</body>
</html>

Why is it an issue for you and why do you need the whitespace in the Locator text?