yougov / mongo-connector

MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)
Apache License 2.0
1.88k stars 478 forks source link

Trouble using connector with Docker and AWS hosted ElasticSearch #576

Open malekascha opened 7 years ago

malekascha commented 7 years ago

Hello,

I'm trying to set up a search engine for my company's product with AWS hosted ElasticSearch. Our entire backend is containerized, so I intended to set up another container to run Mongo Connector. I'm basing my container off of yeasy's work here. I'm able to build the image and run the container, but I keep getting the following error:

/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py:70: UserWarning: Connecting to search-elastic-test-5d7q73lce7j53trxbq4fcsybyy.us-west-2.es.amazonaws.com using SSL with verify_certs=False is insecure. 'Connecting to %s using SSL with verify_certs=False is insecure.' % host)

From what I can gather, this happens when the urllib3 library does not have verify_certs set to true, but when I looked through the code for elastic2_doc_manager I saw that if aws is passed into the args parameter of the config file, it should set it to true. I've been looking through the source code but I haven't been able to find the exact source of the bug. All my code, including the Dockerfile and mongo-connector config, can be found here. Thanks!

ShaneHarvey commented 7 years ago

Can you post the full stack trace in addition to the elastic2-doc-manager you're using? If 'aws' is present it should indeed set verify_certs=True and the connection_class=RequestsHttpConnection but you're seeing verify_certs=False and getting an error from Urllib3HttpConnection.

malekascha commented 7 years ago

/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py:70: UserWarning: Connecting to search-elastic-test-5d7q73lce7j53trxbq4fcsybyy.us-west-2.es.amazonaws.com using SSL with verify_certs=False is insecure. 'Connecting to %s using SSL with verify_certs=False is insecure.' % host) 2016-10-31 22:19:31,015 [INFO] mongo_connector.connector:1062 - Beginning Mongo Connector 2016-10-31 22:19:31,017 [WARNING] mongo_connector.connector:116 - MongoConnector: Can't find oplog.timestamp, attempting to create an empty progress log 2016-10-31 22:19:31,019 [INFO] mongo_connector.connector:224 - MongoConnector: Empty oplog progress file. 2016-10-31 22:19:31,088 [INFO] mongo_connector.oplog_manager:92 - OplogThread: Initializing oplog thread 2016-10-31 22:19:31,119 [INFO] mongo_connector.connector:296 - MongoConnector: Starting connection thread MongoClient(host=['mongodb.dev.goalbook.local:27017'], document_class=dict, tz_aware=False, connect=True, replicaset='singleNodeRepl') 2016-10-31 22:19:31,121 [DEBUG] mongo_connector.oplog_manager:196 - OplogThread: Run thread started 2016-10-31 22:19:31,121 [DEBUG] mongo_connector.oplog_manager:198 - OplogThread: Getting cursor 2016-10-31 22:19:31,121 [DEBUG] mongo_connector.oplog_manager:796 - OplogThread: reading last checkpoint as None 2016-10-31 22:19:31,212 [DEBUG] mongo_connector.oplog_manager:659 - OplogThread: Newest oplog entry has timestamp 1477699436. 2016-10-31 22:19:31,212 [DEBUG] mongo_connector.oplog_manager:488 - OplogThread: Dumping set of collections ['atlas'] 2016-10-31 22:19:31,212 [DEBUG] mongo_connector.oplog_manager:581 - OplogThread: Using bulk upsert function for collection dump 2016-10-31 22:19:31,212 [CRITICAL] mongo_connector.oplog_manager:630 - Exception during collection dump Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/mongo_connector/oplog_manager.py", line 583, in do_dump upsert_all(dm) File "/usr/local/lib/python3.6/site-packages/mongo_connector/oplog_manager.py", line 567, in upsert_all dm.bulk_upsert(docs_to_dump(namespace), mapped_ns, long_ts) File "/usr/local/lib/python3.6/site-packages/mongo_connector/util.py", line 32, in wrapped return f(_args, *_kwargs) File "/usr/local/lib/python3.6/site-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 229, in bulk_upsert for ok, resp in responses: File "/usr/local/lib/python3.6/site-packages/elasticsearch/helpers/init.py", line 161, in streaming_bulk for bulk_actions in _chunk_actions(actions, chunk_size, max_chunk_bytes, client.transport.serializer): File "/usr/local/lib/python3.6/site-packages/elasticsearch/helpers/init.py", line 55, in _chunk_actions for action, data in actions: File "/usr/local/lib/python3.6/site-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 195, in docs_to_upsert for doc in docs: File "/usr/local/lib/python3.6/site-packages/mongo_connector/oplog_manager.py", line 509, in docs_to_dump database, coll = namespace.split('.', 1) ValueError: not enough values to unpack (expected 2, got 1) 2016-10-31 22:19:31,226 [ERROR] mongo_connector.oplog_manager:638 - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient(host=['mongodb.dev.goalbook.local:27017'], document_class=dict, tz_aware=False, connect=True, replicaset='singleNodeRepl'), 'local'), 'oplog.rs') 2016-10-31 22:19:31,226 [DEBUG] mongo_connector.oplog_manager:210 - OplogThread: Last entry is the one we already processed. Up to date. Sleeping. 2016-10-31 22:19:32,123 [ERROR] mongo_connector.connector:304 - MongoConnector: OplogThread <OplogThread(Thread-2, started 140155698366208)> unexpectedly stopped! Shutting down 2016-10-31 22:19:32,123 [INFO] mongo_connector.connector:362 - MongoConnector: Stopping all OplogThreads 2016-10-31 22:19:32,123 [DEBUG] mongo_connector.oplog_manager:377 - OplogThread: exiting due to join call.

I'm using the latest elastic2_doc_manager; the container runs pip install elastic2-doc-manager[aws] when building the image.

ShaneHarvey commented 7 years ago

I still not sure about the insecure connection warning but your namespace of atlas is causing mongo-connector to fail. A namespace must be databasename.collectionname, see the namespaces include option.

Can you test if this issue is present in elastic2-doc-manager master? Install with: pip install https://github.com/mongodb-labs/elastic2-doc-manager/archive/master.zip

malekascha commented 7 years ago

I changed it to elastic2-doc-manager master. The warning is still appearing, but there's now an unauthorized error occurring. I included my AWS keys in the file, so I'm not sure what's happening there. Stack trace:

/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py:70: UserWarning: Connecting to search-elastic-test-5d7q73lce7j53trxbq4fcsybyy.us-west-2.es.amazonaws.com using SSL with verify_certs=False is insecure. 'Connecting to %s using SSL with verify_certs=False is insecure.' % host) /usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py:841: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning) GET /_nodes/_all/clear [status:403 request:0.308s] Fatal Exception Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/mongo_connector/util.py", line 90, in wrapped func(_args, *_kwargs) File "/usr/local/lib/python3.6/site-packages/mongo_connector/connector.py", line 1059, in main conf.parse_args() File "/usr/local/lib/python3.6/site-packages/mongo_connector/config.py", line 118, in parse_args option, dict((k, values.get(k)) for k in option.cli_names)) File "/usr/local/lib/python3.6/site-packages/mongo_connector/connector.py", line 854, in apply_doc_managers dm_instances.append(DocManager(target_url, kwargs)) File "/usr/local/lib/python3.6/site-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 114, in init self.elastic = Elasticsearch(hosts=url, _client_options) File "/usr/local/lib/python3.6/site-packages/elasticsearch/client/init.py", line 169, in init self.transport = transport_class(_normalize_hosts(hosts), _kwargs) File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 126, in init self.sniff_hosts(True) File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 250, in sniff_hosts node_info = self._get_sniff_data(initial) File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 207, in _get_sniff_data timeout=self.sniff_timeout if not initial else None) File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 109, in perform_request self._raise_error(response.status, raw_data) File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 113, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.AuthorizationException: TransportError(403, '{"Message":"User: anonymous is not authorized to perform: es:ESHttpGet on resource: elastic-test"}') Traceback (most recent call last): File "/usr/local/bin/mongo-connector", line 11, in sys.exit(main()) File "/usr/local/lib/python3.6/site-packages/mongo_connector/util.py", line 90, in wrapped func(_args, *_kwargs) File "/usr/local/lib/python3.6/site-packages/mongo_connector/connector.py", line 1059, in main conf.parse_args() File "/usr/local/lib/python3.6/site-packages/mongo_connector/config.py", line 118, in parse_args option, dict((k, values.get(k)) for k in option.cli_names)) File "/usr/local/lib/python3.6/site-packages/mongo_connector/connector.py", line 854, in apply_doc_managers dm_instances.append(DocManager(target_url, *kwargs)) File "/usr/local/lib/python3.6/site-packages/mongo_connector/doc_managers/elastic2_doc_manager.py", line 114, in init self.elastic = Elasticsearch(hosts=url, _client_options) File "/usr/local/lib/python3.6/site-packages/elasticsearch/client/init.py", line 169, in init* self.transport = transport_class(_normalize_hosts(hosts), _kwargs) File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 126, in init self.sniff_hosts(True) File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 250, in sniff_hosts node_info = self._get_sniff_data(initial) File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 207, in _get_sniff_data timeout=self.sniff_timeout if not initial else None) File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 109, in perform_request self._raise_error(response.status, raw_data) File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 113, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.AuthorizationException: TransportError(403, '{"Message":"User: anonymous is not authorized to perform: es:ESHttpGet on resource: elastic-test"}')

ShaneHarvey commented 7 years ago

Aha! You're running into this bug #516. Can you try out mongo-connector master: pip install --upgrade https://github.com/mongodb-labs/mongo-connector/archive/master.zip

Can you also run this under Python 2.7? The library we're using requests-aws-sign is broken on Python 3: https://github.com/jmenga/requests-aws-sign/pull/1

Thanks for finding this!

malekascha commented 7 years ago

I've made the changes you've suggested, but there seems to be an error caused by python 2.7. Stack trace:

Traceback (most recent call last): File "/usr/local/bin/mongo-connector", line 11, in load_entry_point('mongo-connector==2.5.0.dev0', 'console_scripts', 'mongo-connector')() File "/usr/local/lib/python2.7/site-packages/pkg_resources/init.py", line 564, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/local/lib/python2.7/site-packages/pkg_resources/init.py", line 2608, in load_entry_point return ep.load() File "/usr/local/lib/python2.7/site-packages/pkg_resources/init.py", line 2268, in load return self.resolve() File "/usr/local/lib/python2.7/site-packages/pkg_resources/init.py", line 2274, in resolve module = import(self.module_name, fromlist=['name'], level=0) File "/usr/local/lib/python2.7/site-packages/mongo_connector/connector.py", line 28, in from mongo_connector import config, constants, errors, util File "/usr/local/lib/python2.7/site-packages/mongo_connector/config.py", line 19, in from mongo_connector import compat, errors, version ImportError: cannot import name version

I've updated the repo I linked to earlier to reflect my current image setup.

ShaneHarvey commented 7 years ago

Looks like you're finding all of our bugs! I just fixed this problem with #582.

malekascha commented 7 years ago

Thanks for all the help Shane! Mongo-connector is working locally for me now. My image currently installs the mongo-connector master - do you know when the current version will be available from pip install mongo-connector?