j0k3r / graby

Graby helps you extract article content from web pages
MIT License
363 stars 73 forks source link

Extract first line from text as author #351

Open aschilling opened 1 month ago

aschilling commented 1 month ago

Hi everybody,

is there a way to extract the very first line of a tag as author? Unfortunately, there is no tag or any formatting besides the fact that it is the very first line of the tag from class "text". The html looks like this:

<pre class="text">AuthorForename AuthorSurname

this is the text of the author without any formatting which goes a long way

Finally is there a way to remove this first line from the text such that the author information is not saved within the body?

Any help is appreciated

HolgerAusB commented 4 weeks ago

aah, you already did that issue, @aschilling ;-)

Is that a public website? So you can give us an URL to an article?