yougov / mongo-connector

MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)
Apache License 2.0
1.88k stars 479 forks source link

Elasticsearch updates and deletes #421

Closed nVitius closed 8 years ago

nVitius commented 8 years ago

I can't seem to get mongo-connector to replicate my updates/deletes to Elasticsearch.

I'm using mongo-connector 2.3.0 w/ elastic-doc-manager2 and elasticsearch 2.1.1

aherlihy commented 8 years ago

Hello!

I'm going to need more information from you. How are you running mongo-connector?

Are there any errors, and have you looked at the mongo-connector.log file or the elastic logs?

mkozjak commented 8 years ago

Based on the title of this issue, I'd like to ask a question not really connected to OPs problem.

Can mongo-connector be configured to ignore updates and deletes? I want to use it only to sync inserts from mongodb to elasticsearch (elastic2-doc-manager).

aherlihy commented 8 years ago

@mkozjak hi, it would be better if you filed a separate issue because your question is not really related to this issue.

nVitius commented 8 years ago

Sure.

I'm running the connector in a docker container like so:

mongo-connector \
    --batch-size 100 \
    --auto-commit-interval=5 \
    -m mongo \
    --namespace-set voter.voter \
    -t elasticsearch \
    -d elastic2_doc_manager -v --stdout \
    --oplog-ts /data/oplog.timestamp

Given this entity:

{
  "_id": "5702bb014cf4cb1b0070dedb",
  "created": "2016-04-04T19:05:37.180Z",
  "modified": "2016-04-04T19:05:37.190Z",
  "vitals": {
    "citizen": true,
    "last": "Daniels",
    "first": "Bethany",
    "middle": "C",
    "sex": "Female",
    "dob": "1976-03-14T08:00:00.000Z",
    "created": "2016-04-04T19:05:37.166Z"
  },
  "identification": [],
  "contact": {
    "residence": [
      {
        "address": "284 Market",
        "city": "Newark",
        "state": "NJ",
        "zipcode": "07102",
        "created": "2016-04-04T19:05:37.167Z",
        "addressId": 0
      }
    ],
    "mailing": [
      {
        "address": "284 Market",
        "address2": "",
        "city": "Newark",
        "state": "NJ",
        "zipcode": "07102",
        "created": "2016-04-04T19:05:37.167Z",
        "addressId": 0
      }
    ],
    "inactiveConfirmation": [],
    "phone": [],
    "email": [
      {
        "email": "test@test.com",
        "created": "2016-04-04T19:05:37.167Z"
      }
    ]
  },
  "status": [],
  "voterId": "EJfQ5isAe"
}

I change the last name from Daniels to Daniel. This is the output I get from the logs:

2016-04-04 19:21:14,197 [DEBUG] mongo_connector.oplog_manager:188 - OplogThread: Cursor is still alive and thread is still running.
2016-04-04 19:21:14,814 [DEBUG] mongo_connector.oplog_manager:194 - OplogThread: Iterating through cursor, document number in this cursor is 0
2016-04-04 19:21:14,815 [DEBUG] mongo_connector.oplog_manager:239 - OplogThread: Operation for this entry is u
2016-04-04 19:21:14,815 [DEBUG] urllib3.util.retry:156 - Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0)
2016-04-04 19:21:14,818 [DEBUG] urllib3.connectionpool:386 - "POST /_refresh HTTP/1.1" 200 51
2016-04-04 19:21:14,818 [INFO] elasticsearch:63 - POST http://elasticsearch:9200/_refresh [status:200 request:0.003s]
2016-04-04 19:21:14,818 [DEBUG] elasticsearch:65 - > None
2016-04-04 19:21:14,818 [DEBUG] elasticsearch:66 - < {"_shards":{"total":26,"successful":13,"failed":0}}
2016-04-04 19:21:14,820 [DEBUG] urllib3.util.retry:156 - Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0)
2016-04-04 19:21:14,845 [DEBUG] urllib3.connectionpool:386 - "GET /voter/voter/5702bb014cf4cb1b0070dedb HTTP/1.1" 200 870
2016-04-04 19:21:14,846 [INFO] elasticsearch:63 - GET http://elasticsearch:9200/voter/voter/5702bb014cf4cb1b0070dedb [status:200 request:0.025s]
2016-04-04 19:21:14,846 [DEBUG] elasticsearch:65 - > None
2016-04-04 19:21:14,846 [DEBUG] elasticsearch:66 - < {"_index":"voter","_type":"voter","_id":"5702bb014cf4cb1b0070dedb","_version":1,"found":true,"_source":{"status": [], "created": "2016-04-04T19:05:37.180000", "modified": "2016-04-04T19:05:37.190000", "voterId": "EJfQ5isAe", "contact": {"residence": [{"city": "Newark", "addressId": 0, "created": "2016-04-04T19:05:37.167000", "zipcode": "07102", "state": "NJ", "address": "284 Market"}], "inactiveConfirmation": [], "mailing": [{"city": "Newark", "addressId": 0, "created": "2016-04-04T19:05:37.167000", "address2": "", "zipcode": "07102", "state": "NJ", "address": "284 Market"}], "phone": [], "email": [{"email": "test@test.com", "created": "2016-04-04T19:05:37.167000"}]}, "identification": [], "vitals": {"last": "Daniels", "created": "2016-04-04T19:05:37.166000", "dob": "1976-03-14T08:00:00", "sex": "Female", "middle": "C", "citizen": true, "first": "Bethany"}}}
2016-04-04 19:21:14,852 [DEBUG] urllib3.util.retry:156 - Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0)
2016-04-04 19:21:14,905 [DEBUG] urllib3.connectionpool:386 - "PUT /voter/voter/5702bb014cf4cb1b0070dedb?refresh=false HTTP/1.1" 200 144
2016-04-04 19:21:14,905 [INFO] elasticsearch:63 - PUT http://elasticsearch:9200/voter/voter/5702bb014cf4cb1b0070dedb?refresh=false [status:200 request:0.054s]
2016-04-04 19:21:14,906 [DEBUG] elasticsearch:65 - > {"status": [], "created": "2016-04-04T19:05:37.180Z", "modified": "2016-04-04T19:05:37.190Z", "voterId": "EJfQ5isAe", "contact": {"residence": [{"city": "Newark", "addressId": 0, "created": "2016-04-04T19:05:37.167Z", "zipcode": "07102", "state": "NJ", "address": "284 Market"}], "inactiveConfirmation": [], "mailing": [{"city": "Newark", "addressId": 0, "created": "2016-04-04T19:05:37.167Z", "address2": "", "zipcode": "07102", "state": "NJ", "address": "284 Market"}], "phone": [], "email": [{"email": "test@test.com", "created": "2016-04-04T19:05:37.167Z"}]}, "identification": [], "vitals": {"last": "Daniel", "created": "2016-04-04T19:05:37.166Z", "dob": "1976-03-14T08:00:00.000Z", "sex": "Female", "middle": "C", "citizen": true, "first": "Bethany"}}
2016-04-04 19:21:14,906 [DEBUG] elasticsearch:66 - < {"_index":"voter","_type":"voter","_id":"5702bb014cf4cb1b0070dedb","_version":2,"_shards":{"total":2,"successful":1,"failed":0},"created":false}
2016-04-04 19:21:14,906 [DEBUG] urllib3.util.retry:156 - Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0)
2016-04-04 19:21:14,926 [DEBUG] urllib3.connectionpool:386 - "PUT /mongodb_meta/mongodb_meta/5702bb014cf4cb1b0070dedb?refresh=false HTTP/1.1" 200 158
2016-04-04 19:21:14,926 [INFO] elasticsearch:63 - PUT http://elasticsearch:9200/mongodb_meta/mongodb_meta/5702bb014cf4cb1b0070dedb?refresh=false [status:200 request:0.020s]
2016-04-04 19:21:14,926 [DEBUG] elasticsearch:65 - > {"ns": "voter.voter", "_ts": 6269783268606869505}
2016-04-04 19:21:14,926 [DEBUG] elasticsearch:66 - < {"_index":"mongodb_meta","_type":"mongodb_meta","_id":"5702bb014cf4cb1b0070dedb","_version":2,"_shards":{"total":2,"successful":1,"failed":0},"created":false}
2016-04-04 19:21:14,928 [DEBUG] mongo_connector.oplog_manager:294 - OplogThread: Doc is processed.
2016-04-04 19:21:15,930 [DEBUG] mongo_connector.oplog_manager:306 - OplogThread: updating checkpoint afterprocessing new oplog entries
2016-04-04 19:21:15,930 [DEBUG] mongo_connector.oplog_manager:654 - OplogThread: oplog checkpoint updated to Timestamp(1459797674, 1)

When I look at the object in ES though, it's still the same as the original.

chrisamoore commented 8 years ago

:+1: I am having a similar issue.

badavis commented 8 years ago

Same issue happening here. Can we get more eyes on this?

nVitius commented 8 years ago

@aherlihy Okay, this issue is definitely related to PR #382
I tried changing a top-level property on my object and that update did indeed propagate.

aherlihy commented 8 years ago

@nVitius Hi, I just pushed a commit that fixes the issues with nested fields in --include-fields and adds support for --exclude-fields. If you retry with the most recent version of Mongo-Connector (off master), does that fix your issue?

Sorry for the delay in getting back to you!

aherlihy commented 8 years ago

Hi, I'm going to close this ticket due to inactivity but please feel free to reopen or file a new ticket if you are still having problems. Thanks!