Closed mbo-s closed 6 years ago
At the moment the plaintext for a remote job is always fetched by a remote GET
https://github.com/yawik/SimpleImport/blob/master/src/CrawlerProcessor/JobProcessor.php#L175-L187
This does not work if the remote site loads the job content via javascript or use an iframe
If the remotedata contains the needed templateValues http://scrapy-docs.yawik.org/build/html/guidelines/format.html take this, otherwise use remotefetch
"templateValues":{ "description": "<p>We're a good company<\/p>", "tasks":"<b>Your Tasks<\/b><ul><li>Task 1<\/li><li>Task2<\/li><\/ul>", "requirements":"<b>Qualifications<\/b><ul><li>requirement 1<\/li><li>requirement 2<\/li<<\/ul>", "benefits":"<b>We offer<\/b><ul><li>offer 1<\/li><li>offer 2<\/li><\/ul>", "html": "<p>complete html<\/p>" }
something like
$data = $importData['templateValues']; if $data['html'] isset and notempty: $plainText = prettify($data['html']); elseif concatenate (description, tasks, requirements, benefits) is not empty: $plainText = prettify($data['html']) else $plainText = remotefetch(url)
$data = $importData['templateValues'];
if $data['html'] isset and notempty: $plainText = prettify($data['html']);
elseif concatenate (description, tasks, requirements, benefits) is not empty: $plainText = prettify($data['html'])
else $plainText = remotefetch(url)
and prettify(html) should remove all html-tags
hi @fedys
do you have time to take a look on this issue?
Hi @cbleek,
sadly, I don't. I am too busy these days.
At the moment the plaintext for a remote job is always fetched by a remote GET
https://github.com/yawik/SimpleImport/blob/master/src/CrawlerProcessor/JobProcessor.php#L175-L187
This does not work if the remote site loads the job content via javascript or use an iframe
If the remotedata contains the needed templateValues http://scrapy-docs.yawik.org/build/html/guidelines/format.html take this, otherwise use remotefetch
"templateValues":{ "description": "<p>We're a good company<\/p>", "tasks":"<b>Your Tasks<\/b><ul><li>Task 1<\/li><li>Task2<\/li><\/ul>", "requirements":"<b>Qualifications<\/b><ul><li>requirement 1<\/li><li>requirement 2<\/li<<\/ul>", "benefits":"<b>We offer<\/b><ul><li>offer 1<\/li><li>offer 2<\/li><\/ul>", "html": "<p>complete html<\/p>" }
something like
$data = $importData['templateValues'];
if $data['html'] isset and notempty: $plainText = prettify($data['html']);
elseif concatenate (description, tasks, requirements, benefits) is not empty: $plainText = prettify($data['html'])
else $plainText = remotefetch(url)
and prettify(html) should remove all html-tags