Per Nokogiri's documentation, the Node#inner_text method (aliased as text and content as well) is meant to capture "the plaintext content for this Node." Given the usage of the method name inner_text, it implies that it works similarly to the JavaScript method of the same name.
However, the JavaScript method explicitly excludes the inner content of any <style> tags that are children of the given node, while Node#inner_text includes it.
Help us reproduce what you're seeing
Example URL (note that you need to curl this link to reproduce the below, not simply examine it in browser dev tools, as runtime JS changes the underlying DOM structure): https://www.binance.com/en/terms
require 'httparty'
require 'nokogiri'
x = HTTParty.get('https://www.binance.com/en/terms').body
y = Nokogiri::HTML.parse(x).at_css("body div main").inner_text
Expected behavior
Expectation: the result should start with Binance Terms of Use...
Actual: it starts with .css-13trade{box-sizing:border-box...
Please describe the bug
Per Nokogiri's documentation, the Node#inner_text method (aliased as
text
andcontent
as well) is meant to capture "the plaintext content for this Node." Given the usage of the method nameinner_text
, it implies that it works similarly to the JavaScript method of the same name.However, the JavaScript method explicitly excludes the inner content of any
<style>
tags that are children of the given node, whileNode#inner_text
includes it.Help us reproduce what you're seeing
Example URL (note that you need to
curl
this link to reproduce the below, not simply examine it in browser dev tools, as runtime JS changes the underlying DOM structure): https://www.binance.com/en/termsExpected behavior
Expectation: the result should start with
Binance Terms of Use
...Actual: it starts with
.css-13trade{box-sizing:border-box
...Per the JS docs for innerText,