Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df=pd.read_html (filename)
Issue Description
At one point, importing thml table data acquired from the same from the same source, Pandas suddenly rejected file, dropping down with error:
/usr/lib64/python3.9/site-packages/bs4/init.py:435: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup.
warnings.warn(
Traceback (most recent call last):
File "/home/janis/Data/Elektreiba/NOMX-04.py", line 146, in
df=pd.read_html (jauni_dati)
File "/usr/lib64/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 1205, in read_html
return _parse(
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 1006, in _parse
raise retained
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 986, in _parse
tables = p.parse_tables()
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 262, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 618, in _parse_tables
raise ValueError("No tables found")
ValueError: No tables found
Two consecutive files (originally misnamed as xls,representing files before and after the problem) are attached. With the first and data before it everything worked fine (no need for additional lib), with the second html5lib was requested with the message:
Traceback (most recent call last):
File "/home/janis/Data/Elektreiba/NOMX-04.py", line 143, in
df=pd.read_html (jauni_dati)
File "/usr/lib64/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 1205, in read_html
return _parse(
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 982, in _parse
parser = _parser_dispatch(flav)
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 931, in _parser_dispatch
raise ImportError("html5lib not found, please install it")
ImportError: html5lib not found, please install it
Both files look pretty similar and both open the same way in Firefox and Excel.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
At one point, importing thml table data acquired from the same from the same source, Pandas suddenly rejected file, dropping down with error: /usr/lib64/python3.9/site-packages/bs4/init.py:435: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup. warnings.warn( Traceback (most recent call last): File "/home/janis/Data/Elektreiba/NOMX-04.py", line 146, in
df=pd.read_html (jauni_dati)
File "/usr/lib64/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 1205, in read_html
return _parse(
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 1006, in _parse
raise retained
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 986, in _parse
tables = p.parse_tables()
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 262, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 618, in _parse_tables
raise ValueError("No tables found")
ValueError: No tables found
Two consecutive files (originally misnamed as xls,representing files before and after the problem) are attached. With the first and data before it everything worked fine (no need for additional lib), with the second html5lib was requested with the message: Traceback (most recent call last): File "/home/janis/Data/Elektreiba/NOMX-04.py", line 143, in
df=pd.read_html (jauni_dati)
File "/usr/lib64/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 1205, in read_html
return _parse(
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 982, in _parse
parser = _parser_dispatch(flav)
File "/usr/lib64/python3.9/site-packages/pandas/io/html.py", line 931, in _parser_dispatch
raise ImportError("html5lib not found, please install it")
ImportError: html5lib not found, please install it
Both files look pretty similar and both open the same way in Firefox and Excel.
Expected Behavior
import of html table in both cases example.zip
Installed Versions
INSTALLED VERSIONS
commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.9.16.final.0 python-bits : 64 OS : Linux OS-release : 5.15.80 Version : #1 SMP PREEMPT Sun Nov 27 13:28:05 CST 2022 machine : x86_64 processor : Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz byteorder : little LC_ALL : None LANG : lv_LV.UTF-8 LOCALE : lv_LV.UTF-8
pandas : 1.5.3 numpy : 1.23.4 pytz : 2022.6 dateutil : 2.8.2 setuptools : 65.5.0 pip : 23.0.1 Cython : 0.29.32 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : None pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : 1.0.9 fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None