j-andrews7 / kenpompy

A simple yet comprehensive web scraper for kenpom.com.
https://kenpompy.readthedocs.io/en/latest/?badge=latest
GNU General Public License v3.0
70 stars 21 forks source link

FutureWarning: Passing literal html to 'read_html' is deprecated and will be removed in a future version #55

Closed esqew closed 10 months ago

esqew commented 10 months ago

The latest run of our test cases has highlighted that the current paradigm we use to pass raw HTML from mechanicalsoup to a pandas' DataFrame structure is in the process of being deprecated as of pandas@2.1.0, and some warnings are now being thrown as a result when using a pandas version >= 2.1.0:

Deprecated since version 2.1.0: Passing html literal strings is deprecated. Wrap literal string/bytes input in io.StringIO/io.BytesIO instead.

Source

This will necessitate a small change to several lines in the current codebase, namely:

While I can't say I'm quite up to speed on what the rationale for this change is, the fix itself should be particularly easy even when considering backwards compatibility for Python versions >= 3.8 for which we currently test compatibility, since io.StringIO has been available in Python core since pre-3.x. Using conference.py:63 as an example, its fixed version would become:

from io import StringIO
# ...
conf_df = pd.read_html(StringIO(str(table)))[0]