singer-io / tap-mongodb

GNU Affero General Public License v3.0
28 stars 38 forks source link

Initial collection sync runs out of memory #117

Open sophiabits opened 4 months ago

sophiabits commented 4 months ago

We're running in to an issue where our initial ETL run is running out of memory executing find({}). We get the following error:

Executor error during find command :: caused by :: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting., full error: {'ok': 0.0, 'errmsg': 'Executor error during find command :: caused by :: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting.', 'code': 292, 'codeName': 'QueryExceededMemoryLimitNoDiskUseAllowed', '$clusterTime': {'clusterTime': Timestamp([1716238865](tel:1716238865), 4), 'signature': {'hash': b"e?\xea\xa1\xea \x95W\xec\xc6'\x80\xb6\x89:e\x1a\xa6G\xe9", 'keyId': 7305414224291823618}}, 'operationTime': Timestamp([1716238865](tel:1716238865), 4)}

Is there an out of the box fix for this? We can't set the allowDiskUseByDefault option on our cluster as we are running on Atlas. I guess with a code change we could pass allowDiskUse as an option to the find() or we could refactor to pull in batches rather than listing out all documents at once?