Closed AmericanY closed 1 year ago
I don't view it as a bug. It's implementation detail. You have new lines between your tags and new lines are treated as text nodes. Since you use strip=True
they are replaced with empty strings.
Just imagine that there is also text:
text
<span style="display:block;margin-bottom:1ex;">Dietary Supplement: Nutren Diabetes</span>
TEXT <span style="display:block;margin-bottom:1ex;">Dietary Supplement: Fresubin Diabetes</span>
text
We don't want to lose it and I don't handle new lines as a special case. If you need to extract text from spans — it's better to iterate over each span and extract text from it.
@rushter Got it. Thank you.
Output:
Bs4:
Lexbor: