Closed impredicative closed 1 year ago
Rule.extract does not accept a string, only hext.Html
.
import hext
rule = hext.Rule("<a href:link/>")
# (1) Ok, the argument for extract is of type hext.Html
results = rule.extract(hext.Html("""<a href="b"></a>"""))
# (2) Error, the argument for extract is of type string:
results = rule.extract("""<a href="b"></a>""")
If this was possible in a previous version of Hext (≥1.0.0), please let me know, as this would be a breaking change in the API.
The error message is unfortunately very unhelpful, and I will fix that in a future release with html-extract/hext#28.
Thank you for creating this issue.
If this was possible in a previous version of Hext (≥1.0.0), please let me know, as this would be a breaking change in the API.
This was not possible in 0.8 (just re-tested to be sure). AFAIK you always needed to pass a Html object.
Yes, it had been a while since I used hext, and I misremembered. Indeed hext.Rule('').extract(hext.Html(''))
is what works.
As an aside, I think there really needs to exist at least one comprehensive page (or tabs) per supported programming language in the documentation. It would contain various necessary examples to train the user to use hext effectively.
As an example, please see the organization and tabs here (one tab per supported language).
As an aside, I think there really needs to exist at least one comprehensive page (or tabs) per supported programming language in the documentation. It would contain various necessary examples to train the user to use hext effectively.
I agree and have added another issue for this: html-extract/html-extract.github.io#4.
With Python 3.12,
hext.Rule('').extract('')
gives the error:I am of course also getting this error with a more real-life example. At this time I cannot use hext for anything new.