Open alagori opened 5 years ago
you can log into the container then run the cron rake task
docker exec -it lcbo-api_app_1 /bin/bash
rake cron
or by docker compose:
docker-compose exec app rake cron
how can one know when the crawl has completed?
it takes a long time to complete. I got several errors near the end related to saving json to s3 but the crawl was a success. open a rails console and check the counts
I followed the instruction on Readme file, so my database already has the data from the January pull (i.e. the count() would return values). It appears that the database is not being refreshed with the latest data, hence why I'm not sure the crawl is actually active.
FYI, I am also new to Rails and Docker.
I didn't pre populate the data as specified in the README file but you should be able to run the crawler in any case. He called the task cron
because that's how it was setup (to run at an interval)
in this case it was triggered by the linux os in the docker containter. see: config/crontab.txt
Its overkill for everybody who clones the repo to do this on a daily basis so just run it manually once in a while: docker-compose exec app rake cron
You will notice if its running as there is terminal output and its very intensive on your machine
If you look in lib/tasks/cron.rake you will see:
desc 'Run scheduled tasks'
task cron: :environment do
Crawler.run
end
I'm guessing the Crawler is run automatically when you execute the command "docker-compose up"? I tried the command "docker-compose exec app rake cron" and get
rake aborted!
Crawl is already running
/lcboapi/app/models/crawl.rb:47:in init' /lcboapi/lib/crawler.rb:5:in
init'
/lcboapi/lib/boticus/bot.rb:40:in run' /lcboapi/lib/tasks/cron.rake:3:in
block in
I'm getting that as well trying to run it a second time. I think its got something to do with Crawler state. Give me a minute...
run this in rails console Crawl.where(state: [:init, :running, :paused])
app/models/crawl.rb is_active checks for these states and will exit withCrawl is already running
run this in rails console then run the cron task:
Crawl.where(state: [:init, :running, :paused]).destroy_all
The second command generated some error messages - not sure if it's normal. Then running the cron task showed the same "Crawl is already running" message. By the way, really appreciate you helping out!
Below is the output from executing the commands in rails.
Loading development environment (Rails 5.2.2) [1] pry(main)> Crawl.where(state: [:init, :running, :paused]) => Crawl Load (2.7ms) SELECT "crawls". FROM "crawls" WHERE "crawls"."state" IN ($1, $2, $3) [["state", "init"], ["state", "running"], ["state", "paused"]] [#<Crawl:0x000055decdfbe9e0 id: 2810, crawl_event_id: nil, state: "init", task: nil, total_products: 0, total_stores: 0, total_inventories: 0, total_product_inventory_count: 0, total_product_inventory_volume_in_milliliters: 0, total_product_inventory_price_in_cents: 0, total_jobs: 0, total_finished_jobs: 0, store_ids: [], product_ids: [], added_product_ids: [], added_store_ids: [], removed_product_ids: [], removed_store_ids: [], created_at: Sun, 07 Apr 2019 01:28:49 UTC +00:00, updated_at: Sun, 07 Apr 2019 01:28:49 UTC +00:00>] [2] pry(main)> Crawl.where(state: [:init, :running, :paused]).destroy_all Crawl Load (2.4ms) SELECT "crawls". FROM "crawls" WHERE "crawls"."state" IN ($1, $2, $3) [["state", "init"], ["state", "running"], ["state", "paused"]] (0.5ms) BEGIN Crawl Destroy (2.0ms) DELETE FROM "crawls" WHERE "crawls"."" = $1 [["", 2810]] (0.4ms) ROLLBACK ActiveRecord::StatementInvalid: PG::SyntaxError: ERROR: zero-length delimited identifier at or near """" LINE 1: DELETE FROM "crawls" WHERE "crawls"."" = $1 ^ : DELETE FROM "crawls" WHERE "crawls"."" = $1 from /usr/local/bundle/gems/activerecord-5.2.2/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params' Caused by PG::SyntaxError: ERROR: zero-length delimited identifier at or near """" LINE 1: DELETE FROM "crawls" WHERE "crawls"."" = $1 ^
from /usr/local/bundle/gems/activerecord-5.2.2/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params' [3] pry(main)>
try Crawl.find(2810).destroy use any id returned by Crawl.where(state: [:init, :running, :paused])
or try reinstalling everything without importing the old data...
Thanks for the suggestion, I'm not sure why it didn't work. I finally just deleted the db image docker rm lcbo-api-master_app_1 then restarted docker-compose up -d then executed cron docker-compose exec app rake cron
and it's crawling finally! yay! thanks again for all your help.
Where did you find the db image? I'm having the same issue with my crawler @chimemeh
i believe it will be created on initialization of the rails app or on the first crawl. What do you have so far?
there is only a brief mention of the crawler but no instructions on how to run the crawler. if you could post the commands to run the crawler id be more then happy to update the read me with the information and a guide on how to use it.