Indexing tweets in ElasticStack: Should we be including id_str rather than numeric id?

kerchner commented 7 years ago

JSON for a tweet as indxed in ELK appears like:

  "_index": "logstash-2017.05.01",
  "_type": "tweet",
  "_id": "858891879127167000",
  "_score": null,
  "_source": {
    "sm_type": "tweet",
    "urls": [],
    "@timestamp": "2017-05-01T03:52:36.000Z",
    "hashtags": [],
    "user_id": "4925741374",
    "screen_name": "ibndiyn2",
    "@version": "1",
    "host": "myproject_logstash",
    "created_at": "Mon May 01 03:52:36 +0000 2017",
    "id": 858891879127167000,
    "text": "RT @3z0ooz: 3 Syrian friends used to sit and take a photo in the same place every year but this year they couldn't because they were killed…",
    "user_mentions": [
      "3z0ooz"
    ]
  },
  "fields": {
    "@timestamp": [
      1493610756000
    ]
  },
  "sort": [
    1493610756000
  ]
}

Should we be including id_str?

kerchner commented 7 years ago

Recommend updating https://github.com/gwu-libraries/sfm-elk/blob/1.9.0/sfm_elk_loader.py#L25 to index id: .id_str

justinlittman commented 7 years ago

Actually, due to #832 the id is no longer extracted using jq.

Fixed by https://github.com/gwu-libraries/sfm-twitter-harvester/commit/440546008fd8ccb42c6c5ef2774c42883368b47f.

gwu-libraries / sfm-ui

Indexing tweets in ElasticStack: Should we be including id_str rather than numeric id? #883