Closed fu1996 closed 5 days ago
@fu1996 Hi there, thanks for your PR! It would be helpful to understand a more concrete example of the problem this patch fixes. I think this may also result in byline duplication in some cases. Would you be able to share which issue this addresses or provide an example website?
content
@cmkm Currently, on this website, https://www.accesswire.com/860018/network-to-code-and-internet2-team-up-to-pioneer-network-automation-across-research-and-education-community , The header already contains information about <meta name="og: article: author"
and includes sections of<strong id="dateline">NEW York, NY/ACCESSWIRE/May 7, 2024/</strong>
in the HTML of the main text. This will result in the final parsed content result, losing the section of NEW York, NY/ACCESSWIRE/May 7, 2024/.
Thanks, that makes sense in this use case to keep the content.
This could definitely cause some articles to have duplicated text, but I think we'd rather go that way then remove a byline that was needed.
Resolve the issue where
<meta name="og: article: author" content="xxxx">
when there is author information in the meta tag of the head, and if there is also the following content in the body<strong id="timeline">yyyy</strong>
calling checkByline returns incorrect results and the yyyy information is not displayed in the content