Be sure to modify docker-compose to build the loader docker container from local code.
Load a new data set and note the 6-character hash of the new index.
Execute an elasticsearch query to search tweets in this index based on language to confirm that language has been indexed and can be used as a query parameter. Suggested code similar to the following:
# !pip install elasticsearch_dsl
from elasticsearch_dsl import Search, Q
from elasticsearch_dsl.connections import connections as es_connections
from datetime import datetime
client = es_connections.create_connection(hosts=['http://gwtweetsets-dev1.wrlc.org:9200'])
# modify index as needed in the next line
client.indices.get_mapping('tweets-cdb109')
# confirm in the output of the previous line that "language" is present
# try this again with different 'language' value:
s = Search.from_dict({'query': {'bool': {'filter': [{'term': {'language': {'value': 'es'}}}]}},
'aggs': {'top_users': {'terms': {'field': 'user_screen_name', 'size': 10}},
'top_hashtags': {'terms': {'field': 'hashtags', 'size': 10}},
'top_mentions': {'terms': {'field': 'mention_screen_names', 'size': 10}},
'top_urls': {'terms': {'field': 'urls', 'size': 10}},
'tweet_types': {'terms': {'field': 'tweet_type'}},
'created_at_min': {'min': {'field': 'created_at'}},
'created_at_max': {'max': {'field': 'created_at'}}},
'track_total_hits': True,
'_source': ['tweet',
'mention_user_ids',
'user_id',
'mention_screen_names',
'user_screen_name']})
# modify index as needed in the next line
s._index = ['tweets-cdb109']
s.execute()
results = [result for result in s.scan()]
len(results)
# Note that results length differs when language is "es" vs. "fr" vs. "en"
To Test:
language
to confirm thatlanguage
has been indexed and can be used as a query parameter. Suggested code similar to the following: