Visualizations persist after crawls have finished - Githubissues

nasa-jpl-memex / memex-explorer

Viewers for statistics and dashboarding of Domain Search Engine data

BSD 2-Clause "Simplified" License

121 stars 69 forks source link

Visualizations persist after crawls have finished #737

Closed brittainhard closed 8 years ago

brittainhard commented 8 years ago

New crawls created have the same graph as a crawl that has run previously. This may be related to #736 .

ahmadia commented 8 years ago

This is an independent issue. The visualizations all use the same routing key / queue for messages and the same Bokeh server document for storage. The status updates are handled between Celery and Nutch, and should be resilient under multiple crawls. I'll see if I can reproduce.

ahmadia commented 8 years ago

Okay, I'm going to use a different Bokeh server document for each visualization. This won't fix the fact that crawl viz. messages will go haywire if you run two simultaneous crawls, but it should also serially run subsequent crawls to behave more correctly, and we need this anyway.

ahmadia commented 8 years ago

I had initially coupled the Bokeh document name to the queue name when it looked like we were configuring one queue/exchange per crawl. Since we're using a routing key approach we need:

routing key (should be same as crawl name)
queue (can still be "fetcher_log" or maybe something more descriptive)
Bokeh document id (should be the same as crawl name)

It's possible that we'll have to further restrict this based on restrictions in acceptable routing keys/document ids, but I'll try this approach for now.

ahmadia commented 8 years ago

Okay, I've fixed this as well as I can for now. In order to properly display any visualization on the second crawl we really need to get the routing working.

ahmadia commented 8 years ago

https://github.com/memex-explorer/memex-explorer/commit/ec11c2c55d76b710334769299bedb9213f3c612e