Open dynabler opened 8 months ago
Unfortunately I can not replicate this issue. Could you add a bit more detail or a possible script that produces this?
In any case, the message you're seeing is only a warning, so it does not interrupt the scraping process.
Unfortunately I can not replicate this issue. Could you add a bit more detail or a possible script that produces this?
It's a very difficult warning to reproduce, because it only occurs when 2 specific things happen: Original URL is changed AND it redirects to the new URL. If original URL is changed but does not redirect, flyscrape just stops because of 404. I will give an example:
export const config = {
urls: [
...range("https://www.example.com/83743/category/amazing-movies.html?page={}", 1, 29),
], // this is the url as of writing of the script
follow: [
".item-title > a",
],
cache: "file",
depth: 1,
rate: 60,
output: {
file: "amazing_movies.json",
format: "json"
},
headers: {
"User-Agent":""
}
};
function range(url, from, to) {
return Array.from({length: to - from + 1}).map((_, i) => url.replace("{}", i + from));
}
export default function({ doc, absoluteURL }) {
const title = doc.find('h1');
const price = doc.find('.product_main > .price_color')
const stock = doc.find('.product_main > .availability')
return {
title: title.text(),
price: price.text(),
stock: stock.text().trim()
}
}
A few days later the URL changed AND it redirected to the new page
export const config = {
urls: [
...range("https://www.example.com/83743/category/stunning-movies.html?page={}", 1, 29),
], // this is the url changed and it redirects the old /amazing-movies/ to /stunning-movies/
follow: [
".item-title > a",
],
cache: "file",
depth: 1,
rate: 60,
output: {
file: "amazing-movies.json",
format: "json"
},
headers: {
"User-Agent":""
}
};
function range(url, from, to) {
return Array.from({length: to - from + 1}).map((_, i) => url.replace("{}", i + from));
}
export default function({ doc, absoluteURL }) {
const title = doc.find('h1');
const price = doc.find('.product_main > .price_color')
const stock = doc.find('.product_main > .availability')
return {
title: title.text(),
price: price.text(),
stock: stock.text().trim()
}
}
In any case, the message you're seeing is only a warning, so it does not interrupt the scraping process.
Confirmed. It doesn't interrupt the scraping process. But it keep going, scraping null. I was thinking something like this #34, where 301 and 302 (and maybe 307) are detected and warn user.
Got this error when running Flyscrape
cache: failed to insert cache key "GET https://example.com/shoes": UNIQUE constraint failed: cache.key
Error was caused by a change in url. When writing the script it was
https://example.com/shoes
, but in the meantime it changed tohttps://example.com/amazing-shoes
Perhaps an idea to have it output a more clear message like that
"WARNING": "Forgot to call text(), html() or attr()?"