scrapinghub / extruct

Extract embedded metadata from HTML markup
BSD 3-Clause "New" or "Revised" License
839 stars 114 forks source link

DeprecationWarning: the imp module is deprecated in favour of importlib #158

Open honzajavorek opened 3 years ago

honzajavorek commented 3 years ago

extruct uses rdflib==4.2.2, which causes this warning:

/.../python3.8/site-packages/rdflib/plugins/parsers/pyRdfa/utils.py:19
  /.../python3.8/site-packages/rdflib/plugins/parsers/pyRdfa/utils.py:19: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import os, os.path, sys, imp, datetime

Not sure if this has been fixed in rdflib==5, I didn't find anything mentioning importlib in the issues or PRs there. Related: https://github.com/RDFLib/rdflib/issues/1196

lopuhin commented 3 years ago

Thanks for the heads up, right now we're stuck on 4.2.2, https://github.com/scrapinghub/extruct/issues/135 is about fixing this - I hope you don't mind closing this as a duplicate.

honzajavorek commented 3 years ago

OK, thanks! No problem, I'll subscribe to the other issue 👍

jayaddison commented 1 year ago

This issue may be still be valid - I've upgraded to rdflib v7.0.0 for a project that uses extruct locally, and I continue to see runtime warning messages about the imp import from pyRdfa3.

The imp module has been removed from the Python 3.12-dev branch, so this may become more important soon.

In terms of fixes: it looks like the relevant import was removed from pyRdfa3 back in Y2020, although that change is not-yet-released. There's an issue to request an updated release at RDFLib/pyrdfa3#37 (note: commenting is disabled because the repository is archived).

lopuhin commented 1 year ago

thanks for reporting, I didn't manage to check this yet but let me re-open for visibility

jayaddison commented 11 months ago

This should be confirmed by someone else too, but from my experience this did block an upgrade of a project to Python3.12 (the code began failing when extruct imported pyrdfa3 which in turn tried to import imp that no longer exists in py312).

sqwxl commented 9 months ago

I can confirm the same thing is happening to me on 3.12:

Python 3.12.0 (main, Oct  2 2023, 00:00:00) [GCC 13.2.1 20230918 (Red Hat 13.2.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import extruct
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nilueps/.local/lib/python3.12/site-packages/extruct/__init__.py", line 1, in <module>
    from ._extruct import SYNTAXES, extract
  File "/home/nilueps/.local/lib/python3.12/site-packages/extruct/_extruct.py", line 13, in <module>
    from extruct.rdfa import RDFaExtractor
  File "/home/nilueps/.local/lib/python3.12/site-packages/extruct/rdfa.py", line 16, in <module>
    from pyRdfa import Options
  File "/home/nilueps/.local/lib/python3.12/site-packages/pyRdfa/__init__.py", line 295, in <module>
    from .state            import ExecutionContext
  File "/home/nilueps/.local/lib/python3.12/site-packages/pyRdfa/state.py", line 39, in <module>
    from .utils     import quote_URI
  File "/home/nilueps/.local/lib/python3.12/site-packages/pyRdfa/utils.py", line 19, in <module>
    import os, os.path, sys, imp, datetime, socket
ModuleNotFoundError: No module named 'imp'
MmAaXx500 commented 8 months ago

pyRdfa3 3.6.2 is released on PyPI and includes the commit that removed the imp import. Also, it looks like development has been moved to https://github.com/prrvchr/pyrdfa3

jayaddison commented 7 months ago

Thanks @MmAaXx500 for the heads-up - I can confirm that with pyrdfa3 v3.6.2 in use, upgrading an extruct-dependent container to Py3.12 has been unblocked here.