weblyzard / inscriptis

A python based HTML to text conversion library, command line client and Web service.
Apache License 2.0
267 stars 28 forks source link

Display links config #62

Closed crtnx closed 1 year ago

crtnx commented 2 years ago

Hi there,

First, thanks a lot for the wonderful work. Second, treat the issue I am describing below more like an enhancement.

My problem is related to the way of displaying links when it is configured so. There is no way to configure how I want to see them. When enabled, an output includes both label and link, but for my purposes I want to see links only. I've looked at the source code and it is hardcoded this way...

def _start_a(self, attrs):
        self.link_target = ''
        if self.config.display_links:
            self.link_target = attrs.get('href', '')
        if self.config.display_anchors:
            self.link_target = self.link_target or attrs.get('name', '')

        if self.link_target:
            self.tags[-1].write('[')

def _end_a(self):
    if self.link_target:
        self.tags[-1].write(']({0})'.format(self.link_target))

Please, provide a way for displaying links only, without labels. Or maybe give us a way to overwrite default behavior for A element with a custom function. And keep up the good work!

AlbertWeichselbraun commented 2 years ago

thank you for your input.

i agree that providing more options for customizing the output is definitely a good idea. what you already can do (as a workaround) is changing the way tags are handled by providing custom functions for processing the start and end tags.

inscriptis = Inscriptis(html, config)

inscriptis.start_tag_handler_dict['a'] = my_handle_start_a
inscriptis.end_tag_handler_dict['a'] = my_handle_end_a
text = inscriptis.get_text()

i will think a little bit more on how to implement this in an elegant way. any suggestions are more than welcome.

crtnx commented 2 years ago

Hi,

I didn't know about handlers possibility, seems it should be good enough for my purposes. Thanks!

My 2 cents:

AlbertWeichselbraun commented 1 year ago

Many thanks for you input - I have extended the documentation to cover these two points.