Closed ghost closed 2 years ago
Hmmm... Its Either YAML or some legacy code; I'll look into it.
Can you please run the job with '--verbose' and paste at least the top lines (with version and system info) and the full traceback and error message?
Didn't realize from mobile that you posted the complete job. The error I got is
selected_elems = root.xpath(self.expression, namespaces=self.namespaces)
File "src\lxml\etree.pyx", line 1597, in lxml.etree._Element.xpath
File "src\lxml\xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
File "src\lxml\xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Undefined namespace prefix
I don't know XML, but a Google search led me to https://stackoverflow.com/questions/44188237/python-parse-xml-feed-error-xpathevalerror-undefined-namespace-prefix which indicates that you probably need to define namespaces; the sub-directive namespaces
is documented here https://webchanges.readthedocs.io/en/stable/filters.html#css-and-xpath.
Not knowing XML I really can't help you further, but hope to have pointed you to the right direction!
That was it... thanks for pointing me in the right direction.
working job below:
name: 21_SwimSwam News - Time Standard
url: "https://swimswam.com/feed/#1"
filter:
- xpath:
method: xml
path: '//item/title/text()//item/description/text()|//item/content:encoded/text()'
namespaces:
content: http://purl.org/rss/1.0/modules/content/
exclude: 'a'
- keep_lines_containing:
re: '(?i)usa\sswimming|time\sstandard'
- html2text: re
additions_only: true
Yeah, Google's pretty powerful! Glad it worked out.
not until you have the magic word and an example Thanks again.
On Apr 7, 2022, at 7:05 PM, Mike Borsetti @.***> wrote:
Yeah, Google's pretty powerful! Glad it worked out.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.
Not sure it's a bug in me or the software.
I have a job:
According to the docs I should be able to add this line to the path but I get an error.
//item/content:encoded/text()
path: '//item/title/text()|//item/description/text()|//item/content:encoded/text()'
https://webchanges.readthedocs.io/en/stable/filters.html?highlight=RSS#using-css-and-xpath-filters-with-xml-and-exclusions
Seems I am having an issue with the colon.