gwu-libraries / sfm-ui

Social Feed Manager user interface application.
http://gwu-libraries.github.io/sfm-ui
MIT License
153 stars 25 forks source link

Web harvester won't docker-compose scale #408

Open justinlittman opened 8 years ago

justinlittman commented 8 years ago

Every web harvester container must have a heritrix container. This is currently done by simple linking. However, this probably won't work well with docker-compose scale, as the 1:1 pairing won't occur.

Possible approaches to fixing:

  1. Move Heritrix and web container into same container.
  2. Rely on container naming conventions. For example, web_harvest_2 would know to use heritrix_2. See http://stackoverflow.com/questions/29725955/how-do-links-and-scaling-work-together-in-docker-compose.
  3. Link via an ambassador container. See https://docs.docker.com/engine/admin/ambassador_pattern_linking/.
  4. Use some sort of service discovery.
justinlittman commented 7 years ago

From @dchud:

 - big heritrix crawls -- and, perhaps, not deduping, or both -- are a bottleneck.  there are surely tweets with attached media and linked content in my timeline collections that will be gone by the time heritrix catches up.  adding a big list of accounts up front always takes a long while, so later incremental follow-ups take a while to process.  this might be a result of running user timelines every hour.. will have to play with that.  but a second heritrix process for recent/small batches might help.
justinlittman commented 7 years ago

@liuqingli also encountered problems with not being able to scale the web harvesters.