Closed OKNoah closed 7 years ago
Hm, I get Q&A: Former UN rights chief tells Syrians there is hope
but that doesn't even seem to exist on that page. In fact, I get the same title on every page I'm trying to access on aljazeera.com.
from that page:
<nav class="nav-sidebar singlecol">
<h3 class="heading-section"><a href="#">Featured</a></h3>
<article class="item blurb">
<a href="/indepth/features/2015/05/magazine-read-andrew-story-150514071354386.html">
<img src="/mritems/imagecache/mbdsmall//mritems/Images/2015/5/14/7b405fcb27604c348fd586dbcea7854d_18.jpg">
</a>
<h1><a href="/indepth/features/2015/05/magazine-read-andrew-story-150514071354386.html">Magazine Read: Andrew's story</a></h1>
</article>
Currently article-title returns Magazine Read: Andrew's story
. Which is quite logical, it's a perfect match for article h1
.
The actual title seems to be marked with .heading-story
. We could add that in our code. Altho honestly, a multiple <h1>
in <article>
's inside of a navigation sidebar, nah, sounds like just a non-semantic thing in the site's code, isn't it?
It should filter out anything inside a nav
element and .nav-sidebar
class. I think that would fix it.
Seems to be working now.
❯ curl -L http://www.aljazeera.com/news/2015/03/colombia-temporarily-halt-bombing-farc-rebels-150311004806689.html | article-title
Colombia to temporarily halt bombing of FARC rebels
Can't get a title from here: http://www.aljazeera.com/news/2015/03/colombia-temporarily-halt-bombing-farc-rebels-150311004806689.html