weblyzard / inscriptis

A python based HTML to text conversion library, command line client and Web service.
Apache License 2.0
276 stars 28 forks source link

How to extract the page title from the HTML? #85

Open StubbornDeer opened 7 months ago

StubbornDeer commented 7 months ago

Hi guys, your library looks pretty promising but I can't figure out how to extract the page title. With BeautifulSoup, it's pretty straightforward:

parser.title.text

How to get it with inscriptis? Thanks!

AlbertWeichselbraun commented 7 months ago

inscriptis currently focuses on providing an accurate representation of the web page (without site metadata such as title).

if there is sufficient user interest, i might add options to extract the title and other site metadata as well.