Open hadisinaee opened 9 months ago
Interesting. I haven't tried to use the bulk uploader API call and I haven't seen this issue with the arangoimport
tool, even using it to upload a file of ~5 million entries to a WAN based ArangoDB instance.
Is the issue that by using a lambda, you're injecting the time to construct the dictionary into the "connect" sequence? Would it be more resilient to just build the dictionary first? Plus, one reason I avoided going down this pat is concerns I had with batching entries (which the external tool already seems to handle.)
Interesting. I haven't tried to use the bulk uploader API call and I haven't seen this issue with the arangoimport tool, even using it to upload a file of ~5 million entries to a WAN based ArangoDB instance.
Yeah, the arangoimport
can handle large files, but the API seems to be tricky to use.
Is the issue that by using a lambda, you're injecting the time to construct the dictionary into the "connect" sequence? Would it be more resilient to just build the dictionary first? Plus, one reason I avoided going down this pat is concerns I had with batching entries (which the external tool already seems to handle.)
Yes, it might be that. I can try to build the array first and pass it to the function. If it didn't work properly, I'd go then and then simply run the arangoimport
from my python script. I'll give it a try.
I worked on this issue and tried the following methods:
map
API to create a list of documents to import. One hypothesis was that it is probably the bottleneck because it has to build the list while ingesting, and it takes a lot of time. Therefore, I created the list before passing it to the function for import. TLDR; it didn’t work. The reason is that the Docker container shuts down! I don’t know what the problem is with it. I think Tony runs the ADB on his machine; I was wondering if he encounters the same issue or not.
When trying to use the
import_bulk
API frompython-arango
, I noticed that it fails to import all the docs. The following is my input to the function:I have to call
.to_dict
on all objects because they are theIndalekoObjects
class. To make it JSON serializable, we should create a dictionary from them.The error I get is:
The size of the document is
827481
.