Closed nitinthewiz closed 5 years ago
Alternatively, I've realized that the GenericAuthorExtractor does an oddly good job of extracting the name, but I've not found even a single example where it was used as part of the customExtractor to do a sort of mix-and-match where parts of the extractor need to be customized and parts do not.
Is that even possible? Is it possible for me to say -
title: {
selectors: ['h1'],
},
author: {
selectors: GenericAuthorExtractor
},
Update: I see fallback is an option in the test.js files. Perhaps that's the answer to my woes. 🙄
Closing as primary issue is resolved, though I'd still like to know if author can have transforms. I noticed that format and timezone were added to date_published.
I'm trying to build a custom parser for this news site - https://timesofindia.indiatimes.com/india/china-snubs-imran-says-resolve-jk-bilaterally/articleshow/71496416.cms
The author byline section has the date with it, so I thought of using the following transform -
But the split doesn't seem to work. I'm wondering if author even supports transforms. I have noticed that clean is used in some custom parser, but I don't know if transforms are available to author.
Code from Master branch, working on OSX.
In the browser, for the fixture, I can do the following -
$('div.byline').innerText.split('|')[0].trim()
and it seems to work. So just curious.