sindresorhus / article-title

Extract the article title of a HTML document
MIT License
53 stars 7 forks source link

Titles from Al Jazeera #2

Closed OKNoah closed 7 years ago

OKNoah commented 9 years ago

Can't get a title from here: http://www.aljazeera.com/news/2015/03/colombia-temporarily-halt-bombing-farc-rebels-150311004806689.html

kevva commented 9 years ago

Hm, I get Q&A: Former UN rights chief tells Syrians there is hope but that doesn't even seem to exist on that page. In fact, I get the same title on every page I'm trying to access on aljazeera.com.

arthurvr commented 9 years ago

from that page:

<nav class="nav-sidebar singlecol">
  <h3 class="heading-section"><a href="#">Featured</a></h3>
  <article class="item blurb">
    <a href="/indepth/features/2015/05/magazine-read-andrew-story-150514071354386.html">
      <img src="/mritems/imagecache/mbdsmall//mritems/Images/2015/5/14/7b405fcb27604c348fd586dbcea7854d_18.jpg">
    </a>
    <h1><a href="/indepth/features/2015/05/magazine-read-andrew-story-150514071354386.html">Magazine Read: Andrew's story</a></h1>
  </article>

Currently article-title returns Magazine Read: Andrew's story. Which is quite logical, it's a perfect match for article h1.

The actual title seems to be marked with .heading-story. We could add that in our code. Altho honestly, a multiple <h1> in <article>'s inside of a navigation sidebar, nah, sounds like just a non-semantic thing in the site's code, isn't it?

sindresorhus commented 9 years ago

It should filter out anything inside a nav element and .nav-sidebar class. I think that would fix it.

sindresorhus commented 7 years ago

Seems to be working now.

❯ curl -L http://www.aljazeera.com/news/2015/03/colombia-temporarily-halt-bombing-farc-rebels-150311004806689.html | article-title
Colombia to temporarily halt bombing of FARC rebels