neuml / paperetl

📄 ⚙️ ETL processes for medical and scientific papers
Apache License 2.0
352 stars 27 forks source link

Issue processing into Elasticsearch #41

Closed jak1502 closed 1 year ago

jak1502 commented 1 year ago

Hi,

I have both paperetl and elasticsearch set up in docker containers running on my machine. When I try and process a .pdf file and add it to elasticsearch I get the error:

python -m paperetl.file paperetl/data http://localhost:9200 paperetl/models
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/paperetl/file/__main__.py", line 15, in <module>
    sys.argv[4] == "True" if len(sys.argv) > 4 else False,
  File "/usr/local/lib/python3.7/dist-packages/paperetl/file/execute.py", line 176, in run
    db = Factory.create(url, replace)
  File "/usr/local/lib/python3.7/dist-packages/paperetl/factory.py", line 29, in create
    return Elastic(url, replace)
  File "/usr/local/lib/python3.7/dist-packages/paperetl/elastic.py", line 44, in __init__
    exists = self.connection.indices.exists("articles")
  File "/usr/local/lib/python3.7/dist-packages/elasticsearch/_sync/client/utils.py", line 308, in wrapped
    "Positional arguments can't be used with Elasticsearch API methods. "
TypeError: Positional arguments can't be used with Elasticsearch API methods. Instead only use keyword arguments.

I assume it something to do with Elasticsearch changing in V8 but not sure.

davidmezzetti commented 1 year ago

Hello, thank you for reporting the issue.

I assume you're working with the latest version of the Python ES client and Elasticsearch? It's possible something changed from 7.x to 8.x. I'm planning to release a couple updates to paperetl/paperai in the coming weeks. I'll take a look at this.

jak1502 commented 1 year ago

Yes it will have pulled the latest during the pip install I think.

I'll look out for the update cheers.

jak1502 commented 1 year ago

Hello, thank you for reporting the issue.

I assume you're working with the latest version of the Python ES client and Elasticsearch? It's possible something changed from 7.x to 8.x. I'm planning to release a couple updates to paperetl/paperai in the coming weeks. I'll take a look at this.

Hi David, have you done any updates for this yet? Was going to give it another go.

davidmezzetti commented 1 year ago

Unfortunately no, I've been tied up with other efforts.

davidmezzetti commented 1 year ago

I found time to fix this. The code should now work with ES 7.x and ES 8.x.

This will go out with paperetl 2.1.0. In the meantime, you can install paperetl from GitHub:

pip install git+https://github.com/neuml/paperetl