Boldening text of single-line items fixes

scripting commented 10 months ago

In news timelines (aka rivers) we bolden the first sentence of a titleless item.

This feature went in on July 7. It would scan through the text of the headline, counting characters until it hit the end of a sentence or the 50th character and bolden all the text up to that point.

It would also have to deal with links and styling and whatever can be produced by markdown.

The problems would often appear with text generated from markdown, such as the text in John Spurlock's Bluesky feeds. If the magic 50th character showed up inside the text of an anchor element, usually you'd see a glitch of some kind.

It's been on my list for a long time, something to look at and finally I had the time.

The quick solution is to not have a character limit. As long as we haven't reached the end of a sentence, keep going. It didn't make sense as a reader when the ending would happen in the middle of the first sentence.

So I made the change, and it seems to help a lot.

Anyway, if you see glitchiness, the best thing is to post the text of the markdown the item was rendered from and if it's helpful include a screen shot.

scripting commented 10 months ago

I immediately found another one.

ProPublica has multiple paragraph titleless items. With the new approach we now bolden all the text and there's a lot of it.

The boldening code has to look for <p></p> pairs and stop at the end of the first one.

scripting commented 10 months ago

function boldenTitlelessText (theText) { //12/15/23 by DW
    const maxCharsInBold = Infinity; //disable feature
    function bolden (s) {
        return ("<b>" + s + "</b>");
        }
    function notWordChar (ch) {
        if (isAlpha (ch)) {
            return (false);
            }
        if (isNumeric (ch)) {
            return (false);
            }
        return (true);
        }
    if (false) {//(theText.length <= maxCharsInBold) {
        return (bolden (theText));
        }
    else {
        function scanPastMarkup (theText, callback) {
            var ix = 0, flInElement = false, ct = 0, watchthis = "";
            function movePastRightAngleBracket () {
                ix points to first char after <
                    we return with it pointing to the next char after the > 
                while (true) {
                    if (ix > (theText.length - 1)) {
                        break;
                        }
                    if (theText [ix++] == ">") {
                        break;
                        }
                    }
                }
            while (true) {
                if (ix > (theText.length - 1)) { //we've run out of chars
                    break;
                    }
                var ch = theText [ix++];
                if (ch == "<") {
                    movePastRightAngleBracket ();
                    }
                else {
                    if (!callback (ix, ++ct)) {
                        break;
                        }
                    }
                watchthis = stringMid (theText, 1,  ix + 1);
                console.log (watchthis);
                }
            return (stringMid (theText, 1, ix));
            }
        function scanPastMarkup (theText, callback) {
            var ix = 0, flInElement = false, ct = 0, watchthis = "";
            while (true) {
                if (ix > (theText.length - 1)) { //we've run out of chars
                    break;
                    }
                var ch = theText [ix++];
                if (ch == "<") {
                    flInElement = true;
                    }
                else {
                    if ((ch == ">") && flInElement) {
                        flInElement = false;
                        }
                    else {
                        if (!callback (ix, ++ct)) {
                            break;
                            }
                        if (++ct >= ctChars) {
                            break;
                            }
                        }
                    }
                watchthis = stringMid (theText, 1,  ix + 1);
                console.log (watchthis);
                }
            return (stringMid (theText, 1, ix));
            }
        var firstPart = scanPastMarkup (theText, function (ix, ct) { //get the first maxCharsInBold non-markup chars
            if (ct >= maxCharsInBold) { //stop the scan
                return (false);
                }
            else {
                return (true); //continue
                }
            });
        var flDone = false;
        scanPastMarkup (firstPart, function (ix, ct) { //scan forward to find end of sentence, stop there
            if ((firstPart [ix] == " ") && (firstPart [ix - 1] == ".")) {
                firstPart = stringMid (firstPart, 1, ix);
                flDone = true;
                return (false);
                }
            else {
                return (true);
                }
            });
        if (!flDone) { //scan back from the end for the first non-word char, a space, punctuation, >
            for (var i = firstPart.length - 1; i >= 0; i--) {
                ch = firstPart [i];
                if (notWordChar (ch)) {
                    firstPart = stringMid (firstPart, 1, i);
                    break;
                    }
                }
            }
        }
    const newText = bolden (firstPart) + theText.slice (firstPart.length);
    return (newText);
    }

scripting commented 10 months ago

This is the kind of post that used to get messed up.

scripting commented 10 months ago

Fixed a huge bug in today's fix, so now it works a lot better. ;-)

cagrimmett commented 10 months ago

I'm keeping my eye out for unexpected issues.

I found one before lunch with non-markdown feeds, but by the time I got back from lunch and started writing it up, you seemed to have already noticed and fixed it.

Item: https://feedland.org/?item=7602548

Before: CleanShot 2023-12-15 at 11 39 06@2x

After: CleanShot 2023-12-15 at 13 36 45@2x

cagrimmett commented 10 months ago

That said, this is bolding the first line in every paragraph, not just every post, which may be unintended. This feed does not use markdown.

https://boffosocko.com/feed/

XML for the single item above:

<item>
        <title></title>
        <link>https://boffosocko.com/2023/12/14/jesus-is-from-michigan/</link>
                    <comments>https://boffosocko.com/2023/12/14/jesus-is-from-michigan/#comments</comments>

        <dc:creator><![CDATA[Chris Aldrich]]></dc:creator>
        <pubDate>Thu, 14 Dec 2023 18:57:08 +0000</pubDate>
                <category><![CDATA[Social Stream]]></category>
        <category><![CDATA[breviaries]]></category>
        <category><![CDATA[Jesus]]></category>
        <category><![CDATA[manuscript studies]]></category>
        <category><![CDATA[Michigan]]></category>
        <category><![CDATA[Michigan Mitten]]></category>
        <category><![CDATA[XV]]></category>
        <guid isPermaLink="false">https://boffosocko.com/?p=55820210</guid>

                    <description><![CDATA[
    <div>
    <a href="https://boffosocko.com/2023/12/14/jesus-is-from-michigan/"><img title="Jesus lives in Michigan 1800s breviary MS Coll 713" src="https://i0.wp.com/boffosocko.com/wp-content/uploads/2023/12/Jesus-lives-in-Michigan-1800s-breviary-MS-Coll-713.jpg?fit=300%2C276&ssl=1" alt="Manuscript miniature on vellum likely from 15th century of Jesus preaching to a crowd. He&#039;s holding up his hand in a classic mitten shape and pointing with his other hand to the area of his left index finger." width="300" height="276" /></a>
    </div>
    Jesus indicating that he&#8217;s originally from the Long Rapids area in Michigan.  Image is an excerpt from a breviary collage (1800s?) from one of two collages of manuscript miniatures on vellum, probably from a breviary in Northern France, possibly Rouen, in the late 15th century. ↬ Dot Porter at Coffee with a Codex on 2023-12-14. &#8230; <a href="https://boffosocko.com/2023/12/14/jesus-is-from-michigan/" class="more-link">Continue reading <span class="screen-reader-text"></span></a>]]></description>
                                        <content:encoded><![CDATA[<section class="response">
<header>
<span class="kind-display-text"> </span> </header>
</section>

    <div>
    <a href="https://boffosocko.com/2023/12/14/jesus-is-from-michigan/"><img title="Jesus lives in Michigan 1800s breviary MS Coll 713" src="https://i0.wp.com/boffosocko.com/wp-content/uploads/2023/12/Jesus-lives-in-Michigan-1800s-breviary-MS-Coll-713.jpg?fit=300%2C276&ssl=1" alt="Manuscript miniature on vellum likely from 15th century of Jesus preaching to a crowd. He&#039;s holding up his hand in a classic mitten shape and pointing with his other hand to the area of his left index finger." width="300" height="276" /></a>
    </div>
    <p></p>
<p>Jesus indicating that he&#8217;s originally from the Long Rapids area in Michigan. </p>
<p>Image is an excerpt from a breviary collage (1800s?) from one of two collages of manuscript miniatures on vellum, probably from a breviary in Northern France, possibly Rouen, in the late 15th century.</p>
<p><small>↬ Dot Porter at <a href="https://libcal.library.upenn.edu/event/11474082" target="_blank" rel="noopener">Coffee with a Codex on 2023-12-14</a>. Image courtesy of University of Pennsylvania Libraries, Kislak Center for Special Collections, <a href="https://colenda.library.upenn.edu/catalog/81431-p3xh3p" target="_blank" rel="noopener">Ms. Coll. 713</a>.</small></p>]]></content:encoded>

                    <wfw:commentRss>https://boffosocko.com/2023/12/14/jesus-is-from-michigan/feed/</wfw:commentRss>
            <slash:comments>4</slash:comments>

        <post-id xmlns="com-wordpress:feed-additions:1">55820210</post-id>  </item>

scripting commented 10 months ago

Chuck, thanks for the report. There’s a more direct way to see the actual text that is being processed by the algorithm, if you click the <> icon in the icon bar below the item text, it’s the value of description. When yo7 click that icon the json of that object is also displayed in the JavaScript console.

scripting commented 10 months ago

BTW, there was a serious error in the corrected version of the code, so I have to try again. Will report.

scripting commented 10 months ago

I decided to rewrite the code.

As I'm working through it, I had an idea.

Run the text through an XML parser. If it parses, walking over the parsed structure is very reliable, predictable, none of the edge cases are hard.

If it doesn't make it through the parser, send it back with no bold text.

Any thoughts??

Worth a try?

scripting commented 10 months ago

I didn't rewrite it, I found a huge problem and fixed it. An example of what it's supposed to do.

scripting commented 10 months ago

About the huge problem, I was able to make it so that boldenTitlelessText gets the text before it's processed by the Markdown renderer, so there won't be the problems with markup generated by markdown.

That was in addition to fixing the big problem.

So at this point I believe this feature "works."

If more issues show up I believe they will be smaller weird edge cases, and if they're impossible to deal with another item can be opened at that time.

scripting / a8c-FeedLand-Support

Boldening text of single-line items fixes #86