istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 324 forks source link

Scrapy-Cluster UI #129

Closed damienkilgannon closed 7 years ago

damienkilgannon commented 7 years ago

PR to merge my UI branch into IST UI branch. For discussions and collaboration.

This code has been lingering around on my computer for a while now I think its about time I share it and try and get it to a place where it can be merged in. Further work still required on testing but core pieces of the ui_service are in place.

damienkilgannon commented 7 years ago

Using Plotly for some simple visuals. Making stats requests every 15mins and caching up to 10 requests for building visuals. screen shot 2017-07-17 at 3 39 15 pm

madisonb commented 7 years ago

What is the difference between this PR and #95 ? Is this an improvement on the existing implementation and knocks off a number of items from the checklist at #116 ?

damienkilgannon commented 7 years ago

@madisonb its an improvement on #95 and deals with several items from the checklist.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 65.844% when pulling 29963d308ffb8d9104f209834733f1b3831f1bd4 on damienkilgannon:ui into 098f36148f9477cf9d48e408cec4ae2961f792a9 on istresearch:ui.

damienkilgannon commented 7 years ago

@madisonb got the integration tests working via starting and stopping the rest service in the ./run_online_tests.py script .... but now this other problem with the kafka_montior tests ... any ideas?

madisonb commented 7 years ago

We need to ensure the docker tests are ran for the UI. I dont see any test output in the build log here near the bottom, nor do I see the container being used.

I think we need to update this line for travis to ensure the ui container has it's unit and integration test ran, the compose file is located here and it needs to have a UI service, except it should not be using a build context and instead reflect the container tag built by travis prior.

The build log for the normal ubuntu setup is also odd, but I presume that is because none of the other components are working (hence the "did not get response from kafka logs"). Given that we are only testing ui <-> rest, I think that is acceptable here.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 65.844% when pulling d02aed488b0fd636ac24c36989a3a84cd8ac6738 on damienkilgannon:ui into 098f36148f9477cf9d48e408cec4ae2961f792a9 on istresearch:ui.

damienkilgannon commented 7 years ago

@madisonb fixed those issues in docker.sh and docker-compose.test.yml. Yeah, those warning messages in the build logs are because the kafka-monitor and redis-monitor are not running. Maybe all the services should be started in './run_online_tests.sh' for the integration tests regardless?

madisonb commented 7 years ago

I think it is fine, let me take some time to spin the new UI up and I actually think this is pretty easy to merge in. Is it Python 3 compatible out of the box? I would like to hit the check box so the other pending PR can be merged in as well.

EDIT: Meant to hit "comment" instead of "close"

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 65.844% when pulling d02aed488b0fd636ac24c36989a3a84cd8ac6738 on damienkilgannon:ui into 098f36148f9477cf9d48e408cec4ae2961f792a9 on istresearch:ui.

madisonb commented 7 years ago

@damienkilgannon I have made a number of code updates in order to get the crawl submit job working, and while I cannot get the plotly graphs to update after a single data point, I have been able to get them all working and generating graphs.

Please see this commit in particular for the updates I have made.

damienkilgannon commented 7 years ago

@madisonb ok great .... i will checkout branch now and have a look

EDIT: Apologizes for the half baked PR and thanks for fixing the crawl job submit implementation. With the plotly graphs; stats are been created in sc and collected into time buckets of 15min, 1hr, 6hr, 12hr, 1d, 1week and lifetime as you know already. So to build the ui graphs I am just making stats requests every 15min and graphing the 15min cumulative stats. The other option would be to use the lifetime cumulative stats and query that more regularly calc a delta and graph that. Those options can be decided on and perhaps we want to support more that one. So right now you would need your local cluster and ui-service up and running for longer than 15mins to see more than one data point on the graphs.

NEXT STEPS:

Following on from #116

What other features/additions/fixs would you like to see?

damienkilgannon commented 7 years ago

FYI: see below multiple 15min data points graphed. Some timing discrepancies due to the timestamp been set in the flow of the application. It is expected that each data point has a time delta of 15min and represents the total requests processed in the previous 15mins.

screen shot 2017-07-20 at 12 20 55 pm
madisonb commented 7 years ago

So I don't want to prevent us from merging this code into the dev branch just because it doesnt have everything possible from a UI perspective.

I think I would like to see more use of the stats API, in order to populate some of the various pages. Unit testing is probably what I would like to hit first (angular unit tests, or more tests for the base flask app).

Otherwise if you are good with this PR as is, we can add issues and work more PR's against that.

damienkilgannon commented 7 years ago

@madisonb it wasn't my intention to try to merge to dev just yet, I had this code sitting in my repo and wanted to merge to istresearch:ui .... so we could do as you say, create tickets for additional features and the unittests and so forth. Or we can leave it as it is now and continue working on it there doesn't matter to me? I have been some changes made on the stats (using the sliding window stats correctly) which I will push latter today/tomorrow morning. What branch should I add those on to?

madisonb commented 7 years ago

Correct I meant to say ui branch. Lets continue to make PR's against it and keep checking things off.

damienkilgannon commented 7 years ago

@madisonb so i have fixed the way the UI service was reading the stats and added to the branch istresearch/scrapy-cluster:damienkilgannon-ui ... do we want to merge that branch into istresearch/scrapy-cluster:ui, close this PR and work on the branch? orrr?