scrapinghub / extruct

Extract embedded metadata from HTML markup
BSD 3-Clause "New" or "Revised" License
847 stars 113 forks source link

ModuleNotFoundError: No module named 'rdflib_jsonld.serializer #186

Open div927 opened 2 years ago

div927 commented 2 years ago

data = extruct.extract(r.text, base_url=base_url) /Users/divyanshu/flask/lib/python3.6/site-packages/rdflib_jsonld/__init__.py:12: DeprecationWarning: The rdflib-jsonld package has been integrated into rdflib as of rdflib==6.0.1. Please remove rdflib-jsonld from your project's dependencies. DeprecationWarning, Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/divyanshu/flask/lib/python3.6/site-packages/extruct/_extruct.py", line 108, in extract output[syntax] = list(extract(document, base_url=base_url)) File "/Users/divyanshu/flask/lib/python3.6/site-packages/extruct/rdfa.py", line 154, in extract_items jsonld_string = g.serialize(format='json-ld', auto_compact=not expanded) File "/Users/divyanshu/flask/lib/python3.6/site-packages/rdflib/graph.py", line 961, in serialize serializer = plugin.get(format, Serializer)(self) File "/Users/divyanshu/flask/lib/python3.6/site-packages/rdflib/plugin.py", line 107, in get return p.getClass() File "/Users/divyanshu/flask/lib/python3.6/site-packages/rdflib/plugin.py", line 84, in getClass self._class = self.ep.load() File "/Users/divyanshu/flask/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2322, in load return self.resolve() File "/Users/divyanshu/flask/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2328, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) ModuleNotFoundError: No module named 'rdflib_jsonld.serializer'

lopuhin commented 2 years ago

hi @div927 could you please tell the python version and versions of the relevant packages: extruct, rdflib, rdflib-jsonld, pyrdfa3.

div927 commented 2 years ago

@lopuhin python version -> 3.6.9 extruct (0.13.0) rdflib (5.0.0) rdflib-jsonld (0.6.2) pyRdfa3 (3.5.3)

lopuhin commented 2 years ago

@div927 I see, there were some incompatible changes in latest rdflib versions, if I go with latest versions for everything, it works for me with python 3.9, so in you case I hope updating rdflib to 6.0.2 and rdflib-jsonld to 0.6.2 should fix the issue. Probably we should check which versions don't work and add constraints.

div927 commented 2 years ago

@lopuhin I don't think python 3.6.9 can install rdflib to 6.0.2 and rdflib-jsonld to 0.6.2 because when I try I didn't working for me.

Collecting rdflib==6.0.2 Could not find a version that satisfies the requirement rdflib==6.0.2 (from versions: 2.4.1, 2.4.2, 3.0.0, 3.1.0, 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.4.0, 4.0, 4.0.1, 4.1.0, 4.1.1, 4.1.2, 4.2.0, 4.2.1, 4.2.2, 5.0.0rc1, 5.0.0) No matching distribution found for rdflib==6.0.2

lopuhin commented 2 years ago

@div927 oh sorry, my bad - I misread and was checking with python 3.9. Actually we have the same problem with the build here https://github.com/scrapinghub/extruct/runs/3745270289?check_suite_focus=true - let me check if there is some working configuration. Unfortunately old build logs are no longer available. Worst case, downgrading extract should work, and extraction quality and API should be pretty similar.

div927 commented 2 years ago

@lopuhin what version of extruct is compatible with python 3.6.9. If in case have to downgrade it.

lopuhin commented 2 years ago

@div927 aha here is the issue: https://pypi.org/project/rdflib-jsonld/ says that

If you are forced to keep using Python <= 3.6, you will need to keep using release <= 0.5.0 of this plugin with RDFlib 5.0.0.

So if you downgrade rdflib-jsonld to 0.5.0 then it works - I checked with python 3.6

lopuhin commented 2 years ago

actually https://github.com/scrapinghub/extruct/pull/182 already puts that constraints in place, so let us try to finish it (there was another build issue there)

div927 commented 2 years ago

@lopuhin yes !