Open ASL07 opened 4 years ago
Did you try removing single_line_break
?
Yes, that doesn't work either
Nothing wrong with html2text
: your XPath is passing a series of <a>
elements that don't have any separation between them:
<a class="single-opportunity" href="https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner">
IT Technical Product Owner <span class="">United Kingdom</span>
</a>
<a class="single-opportunity" href="https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner">
IT Technical 2 <span class="">United Kingdom</span>
</a>
<a class="single-opportunity" href="https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner">
IT Technical 3 <span class="">UNITED KINGDOM</span>
</a>
If you want line breaks for this specific HTML your XPath needs to capture the outer container as well, in this case a <li>
:
filter:
- xpath: //*[*[@class= 'single-opportunity' and span[contains(text(), 'United Kingdom') or contains(text(), 'UNITED KINGDOM')]]]
This has the desired effect (which, unlike your example above, is sorted correctly):
* [ IT Technical 2 United Kingdom ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
* [ IT Technical 3 UNITED KINGDOM ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
* [ IT Technical Product Owner United Kingdom ](https://career.camlingroup.com/careers/opportunities/tpo-111901-it-technical-product-owner)
Alternatively you can insert a re.sub
filter to modify the HTML to add a <br>
after each <a>
element (<a />
for XHTML):
filter:
- xpath: //*[@class= 'single-opportunity' and span[contains(text(), 'United Kingdom') or contains(text(), 'UNITED KINGDOM')]]
- re.sub:
pattern: </a>
repl: </a><br>
- re.sub:
pattern: <a />
repl: <a /><br />
Hi,
Hope you can help me with this. Please forgive me, I am not an expert on html2text. I think this would be easy to do somehow but I cannot find how
I have this url: http://adriansantos.me/test.html and this job:
Which produces the following output:
How can I make html2text add a new line after each "element"? I mean, how can I achieve this?:
Thanks for your help