thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.81k stars 352 forks source link

KeyError thrown when using default config YAML #745

Closed trevorshannon closed 1 year ago

trevorshannon commented 1 year ago

With the urlwatch-default config file (urlwatch.yaml), it seems that a KeyError is thrown since commit 5385365

➜  urlwatch git:(master) urlwatch --urls ~/urls.yaml
Traceback (most recent call last):
  File "/Users/trevorshannon/projects/urlwatch/urlwatch", line 9, in <module>
    main()
  File "/Users/trevorshannon/projects/urlwatch/lib/urlwatch/cli.py", line 112, in main
    urlwatch_command.run()
  File "/Users/trevorshannon/projects/urlwatch/lib/urlwatch/command.py", line 433, in run
    self.urlwatcher.close()
  File "/Users/trevorshannon/projects/urlwatch/lib/urlwatch/main.py", line 97, in close
    self.report.finish()
  File "/Users/trevorshannon/projects/urlwatch/lib/urlwatch/handler.py", line 217, in finish
    ReporterBase.submit_all(self, self.job_states, duration)
  File "/Users/trevorshannon/projects/urlwatch/lib/urlwatch/reporters.py", line 138, in submit_all
    if cfg['enabled']:
KeyError: 'enabled'

I think this has something to do with adding a __kind__ to MarkdownReporter, TextReporter, and HtmlReporter

This can be worked around by explicitly adding enabled: false to the text, html, and markdown reporter configuration settings.

FYI @ryneeverett

ryneeverett commented 1 year ago

Ah. I noticed this issue but I didn't realize I had introduced it -- I assumed the enabled field had always been required (as I, perhaps incorrectly, put in the docs in #741).

ryneeverett commented 1 year ago

You're right, it's due to adding the __kind__ attribute. It's because util.TrackSubClasses uses that attribute to exclude the base classes from the __subclasses__ attribute.

ryneeverett commented 1 year ago

Perhaps __base_kind__ was the solution after all since the secondary purpose of __kind__ is to signal __subclasses__ membership. What a footgun!

thp commented 1 year ago

Maybe util.TrackSubClasses should be changed to not look at __kind__ but use __mro__ or something?

ryneeverett commented 1 year ago

That was my first thought, but looking at the way it is implemented in filters.py it seems likely that mro would be insufficient because not all classes inheriting from FilterBase have a __kind__ and several methods seem to iterate over __subclasses__ such that their behavior would change if all the classes which don't have __kind__ were included.

thp commented 1 year ago

That was my first thought, but looking at the way it is implemented in filters.py it seems likely that mro would be insufficient because not all classes inheriting from FilterBase have a __kind__ and several methods seem to iterate over __subclasses__ such that their behavior would change if all the classes which don't have __kind__ were included.

I mean, adding __kind__ to all the subclasses should be relatively easy to do if this provides a rather practical fix?

ryneeverett commented 1 year ago

If that worked it would be great, but it appears to me that __kind__ is serving a dual purpose and that one of it's purposes is to distinguish between __kind__ and non-__kind__ classes. Hopefully I'm wrong.

ryneeverett commented 1 year ago

Ok, visually scanning the code quickly, maybe the only example is FilterBase.auto_process. In that function, if we were to use the mro approach, filters would include classes that don't currently have a __kind__ attribute. If that's not acceptable we could filter our the filters that don't have the __kind__ attribute. There might be other such cases that require care.