pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.86k stars 1.92k forks source link

I suggest a html reading feature #6883

Closed elone240 closed 1 year ago

elone240 commented 1 year ago

Problem description

I believe it would be useful if polars had a .read_html function similar to the function in pandas. This could make polars more used in different fields such as web scraping.

zundertj commented 1 year ago

Thank you for your suggestion. I think, though, that html parsing is out-of-scope for Polars, and best left to dedicated libraries like beatifulsoup and lxml, which is what Pandas is using internally anyways. What benefits would an integration have over calling a html parser separately? Also, please note that although Polars is similar to Pandas, both are DataFrame librariers after all, it is in no way meant to be a drop-in replacement for Pandas, so suggestions for new features will be judged based on their own merit.

elone240 commented 1 year ago

Thanks! I’ll keep that in mind in the future.