ip-tools / patzilla

PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
https://docs.ip-tools.org/patzilla/
GNU Affero General Public License v3.0
99 stars 22 forks source link

How to configure data source DEPATISnet #8

Closed ElvezPelvez closed 5 years ago

ElvezPelvez commented 5 years ago

Trying to figure out how to configure DEPATISnet as a data source. So far IP-Navigator always tries to collect from OPS no matter if it is mentioned in the patzilla.ini or not.

So there seems to be no difference whether patzilla.ini reads:

    datasources = ops, depatisnet, depatech

or datasources = depatisnet, depatech

How do I force IP-Navigator to search DEPATISnet and how would I configure this in the .ini?

Thanks!

amotl commented 5 years ago

Dear @ElvezPelvez,

thanks for writing in.

Introduction

PatZilla will always require access to OPS as it is the single data source available for displaying bibliographic data.

However, PatZilla should be able to offer searching at DEPATISnet in its default configuration

datasources = ops, depatisnet

so let's check what might be wrong with your setup.

Investigation

May I ask you whether you see this in the data source chooser widget in the user interface? image

After switching to "DPMA" there, you should be able to query DEPATISnet through the corresponding data source adapter depatisnet.py.

Outlook

To be able to display bibliographic data from DEPATISnet in order to skip OPS on that, this data source adapter would have to be extended to actually acquire the bibliographic details. Currently, it just submits the query expression, takes the list of document numbers from the response and passes on the torch to the display subsystem which is currently solely based on OPS.

While extending depatisnet.py is definitively doable up to a certain degree, it's currently not on our list of priorities. Saying that, we might come back to this in the future, actually depending on pressure vs. opportunities ;].

Advice

Regarding access to OPS, we would recommend getting an account there for provisioning the PatZilla configuration with the corresponding authentication credentials. Regarding searching at DEPATISnet, we are confident to be able to resolve the problem you might be experiencing.

We hope we have been able to shed some light onto the mechanics working under hood, please let us know if this makes any sense to you. Regarding your issue with DEPATISnet, we will be happy to help further after gathering more information about your specific problem.

With kind regards, Andreas.

ElvezPelvez commented 5 years ago

Andreas,

Thanks a lot for the quick reply. I'll apply for OPS access and will get back to you again.

Regarding your question: Yes, the two widgets are present and (if I remember correctly) I can get rid of the EPO one if I'm using "datasources = depatisnet, depatech" instead of "datasources = ops, depatisnet, depatech" in patzilla.ini (I'm running PatZilla using Docker btw.).

Best regards and thanks again,

Erik

ElvezPelvez commented 5 years ago

Hi Andreas,

I now have OPS acces, pasted the two keys into patzilla.ini (either with or without the { }) and restarted the container. I then tried the "Numberlist" and also the "Comfort Search".

However, I am getting the following error:

An exception occurred while processing your query.
Reason: pyramid.httpexceptions.HTTPBadGateway: Could not connect to OPS servers.

Any ideas what I might be doing wrong?

Thanks and best regards,

Erik

ElvezPelvez commented 5 years ago

The following is a log of the error - hope this helps... I did check the access token on the EPO webpage and it worked alright.

E

date stream content
2019-02-11 17:07:39 stdout 2019-02-11 17:07:38,972 WARNING  [patzilla.access.epo.ops.client          ][waitress] Invalidating token and closing connection for client_id=xxxx
2019-02-11 17:07:38 stdout 2019-02-11 17:07:38,972 ERROR    [patzilla.access.epo.ops.client          ][waitress] OpsOAuth2Session HTTPError: 401 Client Error: Unauthorized for url: https://ops.epo.org/3.2/auth/accesstoken. client_id=xxxx
2019-02-11 17:07:38 stdout 2019-02-11 17:07:38,705 WARNING  [patzilla.access.epo.ops.client          ][waitress] Invalidating token and closing connection for client_id=xxxx
2019-02-11 17:07:38 stdout 2019-02-11 17:07:38,705 ERROR    [patzilla.access.epo.ops.client          ][waitress] OpsOAuth2Session HTTPError: 401 Client Error: Unauthorized for url: https://ops.epo.org/3.2/auth/accesstoken. client_id=xxxx
2019-02-11 17:07:38 stdout 2019-02-11 17:07:38,399 INFO     [mongodb_gridfs_beaker                   ][waitress] [MongoDBGridFS] Host URI: mongodb://mongodb:27017
2019-02-11 17:07:38 stdout 2019-02-11 17:07:38,349 INFO     [patzilla.navigator.services.ops         ][waitress] query finished
2019-02-11 17:07:38 stdout  
2019-02-11 17:07:38 stdout HTTPBadGateway: Could not connect to OPS servers.
2019-02-11 17:07:38 stdout raise error
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/patzilla/access/epo/ops/client.py", line 190, in request
2019-02-11 17:07:38 stdout return self.request('GET', url, **kwargs)
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 521, in get
2019-02-11 17:07:38 stdout response = client.get(url, headers={'Accept': 'application/json'}, params={'q': query, 'Range': range})
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/patzilla/access/epo/ops/api.py", line 223, in ops_published_data_search_real
2019-02-11 17:07:38 stdout return ops_published_data_search_real(constituents, query, range)
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/patzilla/access/epo/ops/api.py", line 208, in ops_published_data_search
2019-02-11 17:07:38 stdout return func(*args, **kwargs)
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/beaker/cache.py", line 597, in go
2019-02-11 17:07:38 stdout v = self.createfunc()
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/beaker/container.py", line 378, in get_value
2019-02-11 17:07:38 stdout return self._get_value(key, **kw).get_value()
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/beaker/cache.py", line 322, in get
2019-02-11 17:07:38 stdout return cache[0].get_value(cache_key, createfunc=go)
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/beaker/cache.py", line 599, in cached
2019-02-11 17:07:38 stdout result = ops_published_data_search(constituents, search.expression, range)
2019-02-11 17:07:38 stdout File "/usr/lib/python2.7/site-packages/patzilla/navigator/services/ops.py", line 105, in ops_published_data_search_handler
2019-02-11 17:07:38 stdout Traceback (most recent call last):
2019-02-11 17:07:38 stdout exception:
2019-02-11 17:07:38 stdout None
2019-02-11 17:07:38 stdout response:
2019-02-11 17:07:38 stdout 2019-02-11 17:07:38,349 CRITICAL [patzilla.navigator.services             ][waitress] ops-search error: query="pn=EP0906861B1", reason=pyramid.httpexceptions.HTTPBadGateway: Could not connect to OPS servers.
2019-02-11 17:07:38 stdout 2019-02-11 17:07:38,340 WARNING  [patzilla.access.epo.ops.client          ][waitress] Invalidating token and closing connection for client_id=xxxx
2019-02-11 17:07:38 stdout 2019-02-11 17:07:38,340 ERROR    [patzilla.access.epo.ops.client          ][waitress] OpsOAuth2Session HTTPError: 401 Client Error: Unauthorized for url: https://ops.epo.org/3.2/auth/accesstoken. client_id=xxxx
2019-02-11 17:07:37 stdout 2019-02-11 17:07:37,106 INFO     [patzilla.navigator.services.ops         ][waitress] query cql: pn=EP0906861B1
2019-02-11 17:07:36 stdout 2019-02-11 17:07:36,785 INFO     [patzilla.util.expression                ][waitress] Parsing search expression "pn=EP0906861B1" with syntax "cql" and grammar "default"
2019-02-11 17:07:36 stdout 2019-02-11 17:07:36,784 INFO     [patzilla.navigator.services.ops         ][waitress] query raw: pn=EP0906861B1
2019-02-11 17:06:59 stdout 2019-02-11 17:06:59,704 INFO     [patzilla.access.epo.ops.client          ][waitress] OpsOAuthClientFactory.create_session: identifier=system, client_id=xxxx
2019-02-11 17:06:48 stdout Serving on http://0.0.0.0:9999
2019-02-11 17:06:48 stdout Starting server in PID 1.
2019-02-11 17:06:44 stdout 2019-02-11 17:06:44,844 INFO     [patzilla.access.epo.ops.client          ][MainThread] Creating OpsClientPool
2019-02-11 17:06:42 stdout 2019-02-11 17:06:42,154 INFO     [patzilla.util.config                    ][MainThread] Effective configuration files: /patzilla.ini
2019-02-11 17:06:42 stdout 2019-02-11 17:06:42,148 INFO     [patzilla.util.config                    ][MainThread] Expanded configuration files:  /patzilla.ini, /vendors.ini
2019-02-11 17:06:42 stdout 2019-02-11 17:06:42,142 INFO     [patzilla.util.config                    ][MainThread] Requested configuration files: /patzilla.ini
2019-02-11 17:06:42 stdout 2019-02-11 17:06:42,142 INFO     [patzilla.navigator.settings             ][MainThread] Root configuration file is /patzilla.ini
amotl commented 5 years ago

Thanks Erik,

regarding the error message

HTTPBadGateway: Could not connect to OPS servers.

The current problem you are seeing is on the code, sorry for that! I feel sad that you are experiencing things like that right now when just starting with PatZilla while the OPS interface was pretty much stable across the board over the last years ;].

After people already reported this to us (see #9 ff.), we have been able to partly fix this in the epo-ops-client branch but are currently working on a new release which also fixes some other issues introduced recently.

So, would you mind switching over to the branch if you feel adventurous in the meanwhile? Regarding the next release, we kindly ask for your patience. However, we will try to move this forward in the next few hours.

With kind regards, Andreas.

ElvezPelvez commented 5 years ago

Hi Andreas and thanks a lot!

I'd love to try the branch and I will try to with to it even though I have not done tis before (quite new to GIT). Worst case I just delete the master branch ;-)

Best regards,

Erik

ElvezPelvez commented 5 years ago

Just a quick question: how would I switch to the epo-ops-client branch and rebuild the docker containers using the changed code?

Just do a: git checkout epo-ops-client followed by a docker-compose up

Thanks again,

Erik

amotl commented 5 years ago

a) Assuming you got things going on your workbench: Great to hear and thanks for sharing. @aghster will also be happy to hear that the docker environment works without any efforts even when switching branches.

b) If everything actually works well when accessing OPS, searching at DEPATISnet probably also works for you now?

ElvezPelvez commented 5 years ago

Sorry, but docker is not working. I'm still trying to figure out why: my two cents (I'm not too experienced) would be that the following line in Dockerfile causes the problem:

   pip install patzilla

This actually downloads patzilla-0.161.1.tar.gz from pythonhosted.org and does not care about the local repository, i.e. the ego-ops-client branch.

Since I really am not too experienced with either pip or docker I just hope that @aghster figures out a way to change the Dockerfile (or I just have to wait for your next release).

Thanks anyways and have a nice rest of the day,

Erik

amotl commented 5 years ago

Thanks for your feedback @ElvezPelvez.

Sorry, but docker is not working, [it] downloads patzilla-0.161.1.tar.gz from pythonhosted.org and does not care about the local repository, i.e. the ego-ops-client branch.

That's right, thanks for your investigation. As I am not using Docker with my development environment, I haven't been exactly aware of that.

I just hope that @aghster figures out a way to change the Dockerfile

In principle, I consider the current way a good thing as Docker users will get a stable release version for actually just running PatZilla without any effort on any environment supported by Docker. However, it would definitively be a cool thing if we also support a Docker-based development sandbox, so I've split the discussion about this topic into #10.

or I just have to wait for your next release

I fear it's really just on us this time to cut a new release, sorry again. If you feel more adventurous to setup a development environment directly on your workstation though, you might want to have a look at the corresponding sandbox setup documentation.

ElvezPelvez commented 5 years ago

Thanks a lot and I totally agree the Docker solution "as is" should be the best solution for users who want a stable release. However, I also like the idea of a sandboxed Docker version...

As long as no new release is out I will try the sandboxed setup as proposed and hope I will manage to get through the installation steps.

Thanks again for your help and effort!

aghster commented 5 years ago

The issue explained in #9 seems to be related to a python update (@amotl, did I get this right?). Therefore, a preliminary solution for using the current Docker production setup could be using an older python version. @ElvezPelvez, you could try replacing lines 4 and 5 in Dockerfile with:

  python2=2.7.14 \
  python2-dev=2.7.14 \

Unfortunately, right now I don't have the time to check myself whether this acutally works.

aghster commented 5 years ago

Now I've tried to at least reproduce the error, but everything runs just fine. Then I found that patzilla version 0.162.0 has been released on PyPi about 4 hours ago. So I suppose there is no need anymore for the above workaround, as @amotl apparently fixed #9.

amotl commented 5 years ago

Good to hear that everything works for you with the new release, enjoy your research!

Thanks also @aghster for suggesting the Python downgrade workaround. While this might have helped, let's just move forward.

amotl commented 5 years ago

Closing this now. Thanks again.