mittagessen / kraken

OCR engine for all the languages
http://kraken.re
Apache License 2.0
750 stars 131 forks source link

`KeyError` with `kraken list/get/show` #550

Closed csidirop closed 11 months ago

csidirop commented 1 year ago

Getting KeyError: link, KeyError: type and KeyError: key on new setups and long running systems:

For list:

root@464e69f92060:/var/www# kraken list
Retrieving model list ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   4% 1/24 -:--:-- 0:00:00
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/kraken_venv/bin/kraken:8 in <module>                                                        │
│                                                                                                  │
│   5 from kraken.kraken import cli                                                                │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(cli())                                                                          │
│   9                                                                                              │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1130 in __call__                      │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1055 in main                          │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1688 in invoke                        │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1404 in invoke                        │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:760 in invoke                         │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/decorators.py:26 in new_func                  │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/kraken/kraken.py:673 in list_models                 │
│                                                                                                  │
│   670 │                                                                                          │
│   671 │   with KrakenProgressBar() as progress:                                                  │
│   672 │   │   download_task = progress.add_task('Retrieving model list', total=0, visible=True   │
│ ❱ 673 │   │   model_list = repo.get_listing(lambda total, advance: progress.update(download_ta   │
│   674 │   for id, metadata in model_list.items():                                                │
│   675 │   │   message('{} ({}) - {}'.format(id, ', '.join(metadata['type']), metadata['summary   │
│   676 │   ctx.exit(0)                                                                            │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/kraken/repo.py:228 in get_listing                   │
│                                                                                                  │
│   225 │   total = resp['hits']['total']                                                          │
│   226 │   callback(total, 0)                                                                     │
│   227 │   records.extend(resp['hits']['hits'])                                                   │
│ ❱ 228 │   while 'next' in resp['links']:                                                         │
│   229 │   │   logger.debug('Fetching next page')                                                 │
│   230 │   │   r = requests.get(resp['links']['next'])                                            │
│   231 │   │   r.raise_for_status()                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'links'

For get:

/var/www# kraken get 10.5281/zenodo.2577813
Processing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 bytes -:--:-- 0:00:00
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/kraken_venv/bin/kraken:8 in <module>                                                        │
│                                                                                                  │
│   5 from kraken.kraken import cli                                                                │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(cli())                                                                          │
│   9                                                                                              │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1130 in __call__                      │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1055 in main                          │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1688 in invoke                        │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1404 in invoke                        │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:760 in invoke                         │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/decorators.py:26 in new_func                  │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/kraken/kraken.py:695 in get                         │
│                                                                                                  │
│   692 │                                                                                          │
│   693 │   with KrakenDownloadProgressBar() as progress:                                          │
│   694 │   │   download_task = progress.add_task('Processing', total=0, visible=True if not ctx   │
│ ❱ 695 │   │   filename = repo.get_model(model_id, click.get_app_dir(APP_NAME),                   │
│   696 │   │   │   │   │   │   │   │     lambda total, advance: progress.update(download_task,    │
│   697 │   message(f'Model name: {filename}')                                                     │
│   698 │   ctx.exit(0)                                                                            │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/kraken/repo.py:135 in get_model                     │
│                                                                                                  │
│   132 │   │   raise KrakenRepoException(f'Found {resp["hits"]["total"]} models when querying f   │
│   133 │                                                                                          │
│   134 │   metadata = resp['hits']['hits'][0]                                                     │
│ ❱ 135 │   model_url = [x['links']['self'] for x in metadata['files'] if x['type'] == 'mlmodel'   │
│   136 │   # callable model identifier                                                            │
│   137 │   nat_id = os.path.basename(urllib.parse.urlparse(model_url).path)                       │
│   138 │   spath = os.path.join(path, nat_id)                                                     │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/kraken/repo.py:135 in <listcomp>                    │
│                                                                                                  │
│   132 │   │   raise KrakenRepoException(f'Found {resp["hits"]["total"]} models when querying f   │
│   133 │                                                                                          │
│   134 │   metadata = resp['hits']['hits'][0]                                                     │
│ ❱ 135 │   model_url = [x['links']['self'] for x in metadata['files'] if x['type'] == 'mlmodel'   │
│   136 │   # callable model identifier                                                            │
│   137 │   nat_id = os.path.basename(urllib.parse.urlparse(model_url).path)                       │
│   138 │   spath = os.path.join(path, nat_id)                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'type'

For show:

root@464e69f92060:/var/www# kraken show 10.5281/zenodo.2577813
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/kraken_venv/bin/kraken:8 in <module>                                                        │
│                                                                                                  │
│   5 from kraken.kraken import cli                                                                │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(cli())                                                                          │
│   9                                                                                              │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1130 in __call__                      │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1055 in main                          │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1688 in invoke                        │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:1404 in invoke                        │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/core.py:760 in invoke                         │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/click/decorators.py:26 in new_func                  │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/kraken/kraken.py:646 in show                        │
│                                                                                                  │
│   643 │   from kraken import repo                                                                │
│   644 │   from kraken.lib.util import make_printable, is_printable                               │
│   645 │                                                                                          │
│ ❱ 646 │   desc = repo.get_description(model_id)                                                  │
│   647 │                                                                                          │
│   648 │   chars = []                                                                             │
│   649 │   combining = []                                                                         │
│                                                                                                  │
│ /opt/kraken_venv/lib/python3.9/site-packages/kraken/repo.py:180 in get_description               │
│                                                                                                  │
│   177 │   │   raise KrakenRepoException(msg)                                                     │
│   178 │   meta_json = None                                                                       │
│   179 │   for file in record['files']:                                                           │
│ ❱ 180 │   │   if file['key'] == 'metadata.json':                                                 │
│   181 │   │   │   callback()                                                                     │
│   182 │   │   │   r = requests.get(file['links']['self'])                                        │
│   183 │   │   │   r.raise_for_status()                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'key'

I first saw this problem when setting up a new Docker instance. While investigating this issue I went down from Kraken v4.3.13 to v4.3.08 with the same result. Then I tested an instance running for two months with Kraken v4.3.12 and got the same errors. They were not there a few months ago.

Am I doing something wrong all of a sudden? Is there an issue with zenodo?

mittagessen commented 1 year ago

I believe it is an issue with zenodo. They upgraded the platform and it didn't go smoothly. There's more information here. Dog willing, it should work again once they've got everything sorted out on their end.

csidirop commented 1 year ago

Thanks. Lets hope they can sort everything out.

betaboon commented 1 year ago

@mittagessen i just looked into it a little bit. this problem does not seem (entirely) related to the zenodo upgrade.

it seems like the response data provided by the api changed (even tho i couldn't find that mentioned anywhere).

with the following patch everything seems to work:

diff --git a/kraken/repo.py b/kraken/repo.py
index 69282f3..c50dc9c 100644
--- a/kraken/repo.py
+++ b/kraken/repo.py
@@ -131,10 +131,14 @@ def get_model(model_id: str, path: str, callback: Callable[[int, int], Any] = la
         logger.error(f'Found {resp["hits"]["total"]} models when querying for id \'{model_id}\'')
         raise KrakenRepoException(f'Found {resp["hits"]["total"]} models when querying for id \'{model_id}\'')

-    metadata = resp['hits']['hits'][0]
-    model_url = [x['links']['self'] for x in metadata['files'] if x['type'] == 'mlmodel'][0]
+    record = resp['hits']['hits'][0]
+    metadata_url = [x['links']['download'] for x in record['files'] if x['filename'] == 'metadata.json'][0]
+    r = requests.get(metadata_url)
+    r.raise_for_status()
+    resp = r.json()
     # callable model identifier
-    nat_id = os.path.basename(urllib.parse.urlparse(model_url).path)
+    nat_id = resp['name']
+    model_url = [x['links']['download'] for x in record['files'] if x['filename'] == nat_id][0]
     spath = os.path.join(path, nat_id)
     logger.debug(f'downloading model file {model_url} to {spath}')
     with closing(requests.get(model_url, stream=True)) as r:
@@ -177,9 +181,9 @@ def get_description(model_id: str, callback: Callable[..., Any] = lambda: None)
         raise KrakenRepoException(msg)
     meta_json = None
     for file in record['files']:
-        if file['key'] == 'metadata.json':
+        if file['filename'] == 'metadata.json':
             callback()
-            r = requests.get(file['links']['self'])
+            r = requests.get(file['links']['download'])
             r.raise_for_status()
             callback()
             try:
@@ -225,7 +229,7 @@ def get_listing(callback: Callable[[int, int], Any] = lambda total, advance: Non
     total = resp['hits']['total']
     callback(total, 0)
     records.extend(resp['hits']['hits'])
-    while 'next' in resp['links']:
+    while 'next' in resp.get('links', []):
         logger.debug('Fetching next page')
         r = requests.get(resp['links']['next'])
         r.raise_for_status()
@@ -242,9 +246,9 @@ def get_listing(callback: Callable[[int, int], Any] = lambda total, advance: Non
         if not model_type:
             continue
         for file in record['files']:
-            if file['key'] == 'metadata.json':
+            if file['filename'] == 'metadata.json':
                 callback(total, 1)
-                r = requests.get(file['links']['self'])
+                r = requests.get(file['links']['download'])
                 r.raise_for_status()
                 try:
                     metadata = r.json()
NafQan commented 1 year ago

I just applied the patch of betaboon, but it seems that zendo has already reverted their json response again as I get KeyError 'filename' which is now back to 'key', and links also has not 'download' key, but the old 'self' key. So the code should be:

if file['key'] == 'metadata.json':
  callback(total, 1)
  r = requests.get(file['links']['self'])
  r.raise_for_status()
  try:
    metadata = r.json()
mary-lev commented 1 year ago

This can be solved like:

diff --git a/kraken/repo.py b/kraken/repo.py
index 69282f3..212447d 100644
--- a/kraken/repo.py
+++ b/kraken/repo.py
@@ -225,13 +225,14 @@ def get_listing(callback: Callable[[int, int], Any] = lambda total, advance: Non
     total = resp['hits']['total']
     callback(total, 0)
     records.extend(resp['hits']['hits'])
-    while 'next' in resp['links']:
-        logger.debug('Fetching next page')
-        r = requests.get(resp['links']['next'])
-        r.raise_for_status()
-        resp = r.json()
-        logger.debug('Found {} new records'.format(len(resp['hits']['hits'])))
-        records.extend(resp['hits']['hits'])
+    if 'links' in resp and 'next' in resp['links']:
+        while 'next' in resp['links']:
+            logger.debug('Fetching next page')
+            r = requests.get(resp['links']['next'])
+            r.raise_for_status()
+            resp = r.json()
+            logger.debug('Found {} new records'.format(len(resp['hits']['hits'])))
+            records.extend(resp['hits']['hits'])
     logger.debug('Retrieving model metadata')
     models = {}
     # fetch metadata.jsn for each model
mittagessen commented 1 year ago

They're obviously fiddling around with the API right now. I'll wait for them to announce stabilization before merging any patches. It sucks (especially as older kraken versions will be broken because they don't version the API) but can't be changed for now.

On 23/11/04 02:09PM, Maria Levchenko wrote:

This can be solved like:

diff --git a/kraken/repo.py b/kraken/repo.py
index 69282f3..212447d 100644
--- a/kraken/repo.py
+++ b/kraken/repo.py
@@ -225,13 +225,14 @@ def get_listing(callback: Callable[[int, int], Any] = lambda total, advance: Non
     total = resp['hits']['total']
     callback(total, 0)
     records.extend(resp['hits']['hits'])
-    while 'next' in resp['links']:
-        logger.debug('Fetching next page')
-        r = requests.get(resp['links']['next'])
-        r.raise_for_status()
-        resp = r.json()
-        logger.debug('Found {} new records'.format(len(resp['hits']['hits'])))
-        records.extend(resp['hits']['hits'])
+    if 'links' in resp and 'next' in resp['links']:
+        while 'next' in resp['links']:
+            logger.debug('Fetching next page')
+            r = requests.get(resp['links']['next'])
+            r.raise_for_status()
+            resp = r.json()
+            logger.debug('Found {} new records'.format(len(resp['hits']['hits'])))
+            records.extend(resp['hits']['hits'])
     logger.debug('Retrieving model metadata')
     models = {}
     # fetch metadata.jsn for each model

-- Reply to this email directly or view it on GitHub: https://github.com/mittagessen/kraken/issues/550#issuecomment-1793555532 You are receiving this because you were mentioned.

Message ID: @.***>

ihholmes-p commented 1 year ago

In the meantime, where can I put Kraken models that I download manually? I tried putting them under the kraken/kraken, but it seems that's the wrong spot. I searched the documentation but couldn't find where they are supposed to be. Would greatly appreciate it if someone could tell me the right spot. Thanks in advance.

mittagessen commented 1 year ago

On 23/11/05 07:47AM, ihholmes-p wrote:

In the meantime, where can I put Kraken models that I download manually? I tried putting them under the kraken/kraken, but it seems that's the wrong spot. I searched the documentation but couldn't find where they are supposed to be. Would greatly appreciate it if someone could tell me the right spot. Thanks in advance.

You can just point kraken to the wherever you put them. The order of resolution for the -i argument is first the path and then $XDG_DIR/kraken (which is where kraken get puts the models).