How to extract the page title from the HTML?

weblyzard / inscriptis

A python based HTML to text conversion library, command line client and Web service.

Apache License 2.0

276 stars 28 forks source link

Open StubbornDeer opened 7 months ago

StubbornDeer commented 7 months ago

Hi guys, your library looks pretty promising but I can't figure out how to extract the page title. With BeautifulSoup, it's pretty straightforward:

parser.title.text

How to get it with inscriptis? Thanks!

AlbertWeichselbraun commented 7 months ago

inscriptis currently focuses on providing an accurate representation of the web page (without site metadata such as title).

if there is sufficient user interest, i might add options to extract the title and other site metadata as well.