Since this is a Polars feature request, I think a lazy version would be desired too.
And since we are parsing tables from the internet, I think we might as well add unicode normalization support via the unicode_normalization crate. This article explains what it is and why we need to do it: https://pbpython.com/pandas-html-table.html
As a side note, I realized we might also add support for parsing data in XML format. I know that some big legacy data providers are indeed still using XML to send tabular data. Here is a Rust crate that might be useful: https://docs.rs/xmlparser/latest/xmlparser/
Description
It is common scenario to want to read a table directly from a webpage, e.g. from a Wikipedia article. Pandas has support for this, see here: https://pandas.pydata.org/docs/reference/api/pandas.read_html.html
Since this is a Polars feature request, I think a lazy version would be desired too.
And since we are parsing tables from the internet, I think we might as well add unicode normalization support via the unicode_normalization crate. This article explains what it is and why we need to do it: https://pbpython.com/pandas-html-table.html
As a side note, I realized we might also add support for parsing data in XML format. I know that some big legacy data providers are indeed still using XML to send tabular data. Here is a Rust crate that might be useful: https://docs.rs/xmlparser/latest/xmlparser/