holzschu / Carnets

Carnets is a stand-alone Jupyter notebook server and client. Edit your notebooks on the go, even where there is no network.
https://holzschu.github.io/Carnets_Jupyter/
BSD 3-Clause "New" or "Revised" License
568 stars 34 forks source link

Cannot do pandas.read_html #57

Closed mb1047 closed 4 years ago

mb1047 commented 4 years ago

Trying to do pandas.read_html results in an ImportError: lxml Not found, please install it. Trying to pip install it fails.

I am totally new Carnets, 10mins, so if I am missing something, sorry.

I like the idea a lot. very nice, very good, please continue.

import pandas as pd
url = r"https://en.m.wikipedia.org/wiki/List_of_countries_by_wealth_per_adult"
df = pd.read_html(url)
---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-2-fc3b9b0a75aa> in <module>
      1 url = r"https://en.m.wikipedia.org/wiki/List_of_countries_by_wealth_per_adult"
----> 2 df = pd.read_html(url)

/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/Library/lib/python3.7/site-packages/pandas-0.24.2-py3.7-macosx-10.9-x86_64.egg/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only)
   1092                   decimal=decimal, converters=converters, na_values=na_values,
   1093                   keep_default_na=keep_default_na,
-> 1094                   displayed_only=displayed_only)

/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/Library/lib/python3.7/site-packages/pandas-0.24.2-py3.7-macosx-10.9-x86_64.egg/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
    892     retained = None
    893     for flav in flavor:
--> 894         parser = _parser_dispatch(flav)
    895         p = parser(io, compiled_match, attrs, encoding, displayed_only)
    896 

/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/Library/lib/python3.7/site-packages/pandas-0.24.2-py3.7-macosx-10.9-x86_64.egg/pandas/io/html.py in _parser_dispatch(flavor)
    849     else:
    850         if not _HAS_LXML:
--> 851             raise ImportError("lxml not found, please install it")
    852     return _valid_parsers[flavor]
    853 

ImportError: lxml not found, please install it
%pip install lxml
Collecting lxml
  Using cached https://files.pythonhosted.org/packages/c4/43/3f1e7d742e2a7925be180b6af5e0f67d38de2f37560365ac1a0b9a04c015/lxml-4.4.1.tar.gz
Installing collected packages: lxml
  Running setup.py install for lxml: started
    Running setup.py install for lxml: finished with status 'error'

    ERROR: Complete output from command /var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/Library/bin/python3 -u -c 'import setuptools, tokenize;__file__='"'"'/private/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/tmp/pip-install-eoasic6q/lxml/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/tmp/pip-record-_r4j6p7a/install-record.txt --single-version-externally-managed --compile:
    ERROR: Building lxml version 4.4.1.
    Building without Cython.
    ERROR: b'xslt-config: command not found\n'
    ** make sure the development packages of libxml2 and libxslt are installed **

Note: you may need to restart the kernel to use updated packages.

    Using build configuration of libxslt
    running install
    running build
    running build_py
    creating build
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/_elementpath.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/sax.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/pyclasslookup.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/__init__.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/builder.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/doctestcompare.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/usedoctest.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/cssselect.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/ElementInclude.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/__init__.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/soupparser.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/defs.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/_setmixin.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/clean.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/ElementSoup.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/_diffcommand.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/html5parser.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/__init__.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/formfill.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/builder.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/_html5builder.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/usedoctest.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    copying src/lxml/html/diff.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/html
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron
    copying src/lxml/isoschematron/__init__.py -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron
    copying src/lxml/etree.h -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/etree_api.h -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/lxml.etree.h -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/lxml.etree_api.h -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml
    copying src/lxml/includes/xmlerror.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/c14n.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/xmlschema.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/__init__.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/schematron.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/tree.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/uri.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/etreepublic.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/xpath.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/htmlparser.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/xslt.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/config.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/xmlparser.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/xinclude.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/dtdvalid.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/relaxng.pxd -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/lxml-version.h -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    copying src/lxml/includes/etree_defs.h -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/includes
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/rng
    copying src/lxml/isoschematron/resources/rng/iso-schematron.rng -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/rng
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl
    copying src/lxml/isoschematron/resources/xsl/XSD2Schtrn.xsl -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl
    copying src/lxml/isoschematron/resources/xsl/RNG2Schtrn.xsl -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl
    creating build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_abstract_expand.xsl -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_dsdl_include.xsl -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_skeleton_for_xslt1.xsl -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_svrl_for_xslt1.xsl -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/iso_schematron_message.xsl -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    copying src/lxml/isoschematron/resources/xsl/iso-schematron-xslt1/readme.txt -> build/lib.iphoneos-10.14-iPhone10,6-3.7/lxml/isoschematron/resources/xsl/iso-schematron-xslt1
    running build_ext
    building 'lxml.etree' extension
    creating build/temp.iphoneos-10.14-iPhone10,6-3.7
    creating build/temp.iphoneos-10.14-iPhone10,6-3.7/src
    creating build/temp.iphoneos-10.14-iPhone10,6-3.7/src/lxml
    error: ('clang not available on iOS for ', ['clang', '-Wno-unused-result', '-Wsign-compare', '-Wunreachable-code', '-DNDEBUG', '-g', '-fwrapv', '-O3', '-Wall', '-DCYTHON_CLINE_IN_TRACEBACK=0', '-Isrc', '-Isrc/lxml/includes', '-I/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/Library/include/python3.7m', '-c', 'src/lxml/etree.c', '-o', 'build/temp.iphoneos-10.14-iPhone10,6-3.7/src/lxml/etree.o', '-w', '-flat_namespace'])
    ----------------------------------------
ERROR: Command "/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/Library/bin/python3 -u -c 'import setuptools, tokenize;__file__='"'"'/private/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/tmp/pip-install-eoasic6q/lxml/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/tmp/pip-record-_r4j6p7a/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/mobile/Containers/Data/Application/ED010E53-0B5E-48AD-8C11-21C699BEB145/tmp/pip-install-eoasic6q/lxml/
mb1047 commented 4 years ago

Solution for pandas.read_html:

pd.read_html(url,flavor='html5lib')

html5lib is bs4, which also needs to be pip installed, but unlike with lxml there is no problem and it installs fine.

So, reading html into pd.DataFrames works! :) Just the lxml issue remains. But that was not the point of this post, so closing.

holzschu commented 4 years ago

Hi, Thank you and well done for finding the solution. lxml is one library I am not able to cross-compile, so I try to work around it.