Closed cristianmtr closed 3 years ago
Hi @cristianmtr,
Can you help me with an example code to reproduce this bug? I'll test it and reopen the PR.
Hi @cristianmtr, I would like to work on this issue.
Hey @sauravgarg540
Thanks for the initiative!
We haven't been able to reproduce the issue actually. It was reported to us by one of our EAPs. If you can reproduce it in a test, then we'd appreciate that and the PR to fix it.
Hey @cristianmtr , happy to contribute.
I did some debugging and it doesn't seem like a bug to me. The issue description is correct about the list of numpy.float
and python float
but the reason why the issue is not reproducible is that it doesn't matter if the array being stored in the DB is an array of numpy float
or python float
, embedding is stored as an array of Double
in the MongoDb and retrieved as float
from the DB.
Please let me know if I am missing something.
Then I guess we need to be consistent with the data types. If you can make a PR cleaning that up, it would be appreaciated.
PR created #160
Closed by #160
Quoting from Slack:
Hey everyone, my colleague just found a bug in your MongoDBIndexer. Please use .tolist() instead of list() for converting a numpy array to a python list:
https://github.com/jina-ai/executors/blob/d943ed10f282081bb4a49ad00617de71fae81dfd/jinahub/indexers/storage/MongoDBStorage/mongo_handler.py#L53
https://github.com/jina-ai/executors/blob/d943ed10f282081bb4a49ad00617de71fae81dfd/jinahub/indexers/storage/MongoDBStorage/mongo_handler.py#L69
Bad:
item['embedding'] = list(doc.embedding.flatten()) # causes errors when inserting in db, because it creates a list of numpy scalars
Fix:
item['embedding'] = doc.embedding.flatten().tolist() # the right way; creates a list of python floats
Error:
InvalidDocument("cannot encode object: -0.43712485, of type: <class 'numpy.float32'>")