vsoch / watchme

Reproducible watchers for research
https://vsoch.github.io/watchme/
Mozilla Public License 2.0
877 stars 32 forks source link

How to debug `ERROR Error running task.`? #64

Closed abitrolly closed 4 years ago

abitrolly commented 4 years ago

What is your question?

image

$ watchme inspect covid-belta
[watcher]
active  = true
type  = urls
protected  = on

[task-watch-belta]
url  = http://stopcovid.belta.by
active  = true
type  = urls

A --verbose key showing the stdout/stderr of the failed process would help.

vsoch commented 4 years ago

You're right! If there is a bug that is not related to the response of the get call (e.g., in this case) the error is silent. I can tell you from experience that the issue is that by default the watcher assumes the result can be parsed as json, so you need to set save_as to something else:

[watcher]
active  = true
type  = urls

[task-watch-belta]
url  = http://stopcovid.belta.by
active  = true
type  = urls
save_as = false

And then many of the examples show that you can test run a task - note that my command first has the name of the watcher (tasks) and then the test name:

watchme run tasks task-watch-belta --test

And that will show the full content of the page, which is a bit messy. You could save the raw content, or your might want to look into how to apply a selector: https://vsoch.github.io/watchme/watchers/urls/#4-select-on-a-page-task

I will look into providing more debug information for this case.

abitrolly commented 4 years ago

Documentation mentions that by default the content is expected to be a string - https://github.com/vsoch/watchme/blob/master/docs/_docs/watcher-tasks/urls.md#1-get-a-url-task Perhaps that page needs to be updated together with possible values of save_as.

I will try selection scraper a little bit later.

vsoch commented 4 years ago

okay, I am going to open a PR with a new "debugging page" that shows you how to clearly get the error output - it comes down to watchme running workers via multiprocessing, so the error is sort of hidden from the client. If you add --serial you can get around that. Opening soon.

vsoch commented 4 years ago

Thanks for opening this issue @abitrolly ! I wrote you a debugging page that shows the use of --serial I mentioned before.

https://vsoch.github.io/watchme/getting-started/debugging/

I was able to reproduce your error, add the save_as as type "text" and then run the job successfully. Grabbing the whole page is quite a bit of text, so I'd definitely look into the selection example when you have time later. Thanks again!

abitrolly commented 4 years ago

--serial helps. Thanks! :)

$ watchme run covid-belta task-watch-belta --serial
Found 1 contender tasks.
[task-watch-belta:1/1] |===================================| 100.0% 
Traceback (most recent call last):
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/bin/watchme", line 8, in <module>
    sys.exit(main())
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/lib/python3.7/site-packages/watchme/client/__init__.py", line 350, in main
    main(args, extras)
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/lib/python3.7/site-packages/watchme/client/run.py", line 29, in main
    show_progress=not args.no_progress)
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/lib/python3.7/site-packages/watchme/watchers/__init__.py", line 791, in run
    results = self.run_tasks(tasks, parallel, show_progress)
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/lib/python3.7/site-packages/watchme/watchers/__init__.py", line 731, in run_tasks
    results[task.name] = task.run()
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/lib/python3.7/site-packages/watchme/tasks/__init__.py", line 110, in run
    return func(**params)
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/lib/python3.7/site-packages/watchme/watchers/urls/tasks.py", line 47, in get_task
    result = parse_success_response(response, kwargs)
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/lib/python3.7/site-packages/watchme/watchers/urls/helpers.py", line 75, in parse_success_response
    result = response.json()
  File "/home/anatoli/.local/share/virtualenvs/watchme-2Ix1g0V8/lib/python3.7/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib64/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
vsoch commented 4 years ago

Great! You should be able to set save_as to text (not the default) and the big will go away.