Closed d668 closed 3 months ago
Hello @d668 ,
Thank you for reporting. However, I do not believe anything will be done now for this.
There is currently too much code to change/understand to make it work correctly for the time we can allow, as even Chrome and Firefox have different behaviors depending on whether there is some empty line between them or not.
The current InnerText in Chrome is: span1\n\np1\n\nspan1 span2\n\np2\n\nspan2
Notice that span1
and span2
are separated by a space while others have a new line. This case looks easy to handle, but it will require way more time to verify all InnerText
rules that we currently don't have.
But indeed, HAP doesn't provide the same InnerText
as a real browser.
Best Regards,
Jon
Notice that span1 and span2 are separated by a space while others have a new line.
you are right, so HAP is making two mistakes actually, making new line between span1 and span2 and not making new lines in span1p1span1. Bot Chrome and Firefox show it as
span1
p1
span1 span2
p2
span2
But indeed, HAP doesn't provide the same InnerText as a real browser.
Oh man and what then? not same but some? It really does look like you just don't have resources to fix it an obvious bug.
Hello @d668 ,
Feel free to propose a pull request with the fix ;)
We are currently reviewing/merging this week some other pull requests that have been submitted recently, so that would be a perfect time.
Best Regards,
Jon
If this is your excuse for not maintaining a project you started, that's lame. I am fine with beautifulsoup
man closing the issue with obvious bug?
1. Description
htmlDoc.DocumentNode.InnerText gives inconsistent results whether there is a new line between HTML elements
see the fiddle. both outputs should be the same and should not depend whether there is new line in HTML markup
2. Exception
3. Fiddle or Project
https://dotnetfiddle.net/JOmlX0
4. Any further technical details