Open raphael10-collab opened 3 years ago
How to clone the video portion of the HTML page in order to extract and keep it intact?
clone
For example: From this url : https://abcnews.go.com/Politics/arizona-gov-doug-ducey-signs-law-purge-voters/story?id=77606533&cid=clicksource_4380645_1_heads_hero_live_hero_image
I would like to keep the video streaming.
I tried to modify the abcnew.go.com extractor in this way:
export const AbcnewsGoComExtractor = { domain: 'abcnews.go.com', title: { selectors: ['.article-header h1'], }, author: { selectors: ['.authors'], clean: ['.author-overlay', '.by-text'], }, date_published: { selectors: ['.timestamp'], timezone: 'America/New_York', }, lead_image_url: { selectors: [['meta[name="og:image"]', 'value']], }, video: { selectors: [ 'inline-video-wrapper', 'video', ] }, content: { defaultCleaner: false, selectors: [ '.article-copy', '#player-api', 'inline-video-wrapper', 'video', ], // Is there anything that is in the result that shouldn't be? // The clean selectors will remove anything that matches from // the result clean: [], }, };
But this is the output:
I also tried in this way, but it doesn't work:
'div.inline-content': $node => { if ($node.has('img,iframe,video').length > 0) { return $node; } },
OS: Ubuntu 18.04
Are there maybe any updates regarding this?
How to
clone
the video portion of the HTML page in order to extract and keep it intact?For example: From this url : https://abcnews.go.com/Politics/arizona-gov-doug-ducey-signs-law-purge-voters/story?id=77606533&cid=clicksource_4380645_1_heads_hero_live_hero_image
I would like to keep the video streaming.
I tried to modify the abcnew.go.com extractor in this way:
But this is the output:
I also tried in this way, but it doesn't work:
How to
clone
the video portion of the HTML page in order to extract and keep it intact?OS: Ubuntu 18.04