gwu-libraries / TweetSets

Service for creating Twitter datasets for research and archiving.
MIT License
25 stars 2 forks source link

T47 elasticsearch upgrade #62

Closed lwrubel closed 3 years ago

lwrubel commented 3 years ago

To set up:

  1. After checking out the branch on each VM, move/rename existing docker-compose.yml and .env files
  2. Copy the example versions of the appropriate docker-compose and env files onto the primary node and cluster node(s) and customize with hostnames, IPs, master status. There must be two master-eligible nodes, so set MASTER=true on both.
  3. Edit the docker-compose.yml file on the primary node to build (not pull image) the server and loader images.
  4. Remove any existing images for the loader and server images on the primary node: docker rmi ts_server ts_flaskrun ts_loader docker rmi gwul/tweetsets-server gwul/tweetsets-flaskrun gwul/tweetsets-loader
  5. docker-compose up -d --build

Test loading a dataset as part of the review.

lwrubel commented 3 years ago

I am noticing that there are still some upgrade steps required in order for the spark container to work, including upgrading the elasticsearch-hadoop-6.2.2.jar that's in this repo and referenced in Dockerfile-loader. Will push an update to this branch, but please review other functionality in the meantime.

dolsysmith commented 3 years ago

Datasets successfully loaded with both regular loader and spark-loader. UI working as expected.