usc-isi-i2 / kgtk-similarity

MIT License
27 stars 6 forks source link

Error fetching certain Qnode pairs #3

Closed sidsvash26 closed 3 years ago

sidsvash26 commented 3 years ago

The example Python code works fine for me for the given pairs of nodes in the example. And it works in general with a new set of examples. However, once in a while, some Qnode pairs throw an error.

For instance, I have the following Qnode pairs file:

q1  q2
Q5121444    Q1388151
Q7188   Q39631
Q17156448   Q17502905
Q1662644    Q39631

When I use the example Python code to send this file to the API it gives me the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-70-6a619166bed0> in <module>
----> 1 call_semantic_similarity(temporary_file, url)

<ipython-input-29-26d17f2de0e7> in call_semantic_similarity(input_file, url)
     10     }
     11     resp = requests.post(url, files=files, params={'similarity_types': 'all'})
---> 12     s = json.loads(resp.json())
     13     return pd.DataFrame(s)
     14 

~/miniconda3/envs/aida/lib/python3.8/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    339     else:
    340         if not isinstance(s, (bytes, bytearray)):
--> 341             raise TypeError(f'the JSON object must be str, bytes or bytearray, '
    342                             f'not {s.__class__.__name__}')
    343         s = s.decode(detect_encoding(s), 'surrogatepass')

TypeError: the JSON object must be str, bytes or bytearray, not dict

On investigating further, I found that it is the 3rd pair in the above file which is causing the issue -- (Q17156448, Q17502905)

When I remove the 3rd pair, the API works fine. I checked on WIkidata that both of the Qnodes are valid existing Q nodes.

Could you please help why this is happening?

chalypso commented 3 years ago

The problem was that for some reason there were no complex and transe embeddings for Q17156448, but there was a text embeddings. Because of that topsim computed an empty set of candidates which broke one of the methods. This is fixed now in a0f84bc868b85631a5eae814d30ea7d7e7428fb1. In general that node seems very unconnected since all the other measures except text return 0. Once the service is restarted, the problem should go away.

There is a further issue that we have a heterogeneous return type on the post response, where errors return JSON dicts but successful responses return JSON strings. I didn't want to fix that at this point, since people's access code might have already factored that in.

saggu commented 3 years ago

@sidsvash26 The deployed similarity API will be updated in 1-2 days.

sidsvash26 commented 3 years ago

Thank you, both!

sidsvash26 commented 3 years ago

@chalypso As of today, I'm still receiving the same error. A workaround I have been following is fetching only a single pair at a time, and if it throws an error, I skip it. But that approach doesn't allow me to fetch a batch list of Qnode pairs as even if a single pair is faulty, the API request is not successful. This is making the process of fetching similarities very slow.

Another pair that threw an error: ('Q170212', 'Q5658542')

dgarijo commented 3 years ago

@sidsvash26 if you still have the issue after updating the latest release of kgtk, please reopen it

saggu commented 3 years ago

Updating the service now, it'll stop working meanwhile

saggu commented 3 years ago

Should be back up now with the fix