Open happysalada opened 1 year ago
This is quite an interesting idea, I like it! It goes a bit further and might be worth a RFC as there are some extra things to consider. When we have an HTML extractor, we will need a structural representation of the data once it's extracted. That leads to an HTML codec that both decodes HTML into this structure and encodes this structure into an HTML page (which could be super cool to be honest).
@happysalada how do you feel about throwing an RFC up on the topic?
Let me try to carve some time for this.
Awesome, thanks!
Describe the problem you are trying to solve Exctract data from an html page. Lots of older sites with valuabke data dont have an api. Extracting html with a regex is possible but very inconvenient
Describe the solution you'd like An html extractor whete you would have an api similat yo css selectors
Notes
If this is an implementation of an RFC provide a URL to the RFC this enhancement implements.
If this is a major enhancement or contribution an RFC may be required. It is ok to submit an enhancement first and our core team will assist with major contributions. In general, major contributions should be discussed with the community before submission.