Closed elreymon closed 1 year ago
@elreymon Yes it's true because i use Input Schema for requestListSources Input Schema So correct property looks like this:
"startUrl": [
{
"url": "https://www.idealista.com/venta-viviendas/madrid/barrio-de-salamanca/castellana/con-precio-hasta_500000,con-solo-pisos,apartamentos,aticos,de-dos-dormitorios,de-tres-dormitorios,de-cuatro-cinco-habitaciones-o-mas,ascensor,ultimas-plantas,plantas-intermedias/"
}
]
}
But when I check your url in browser Idealista didn't return any item
Used Input Schema as you said: { "maxItems": 3, "proxy": { "useApifyProxy": true, "apifyProxyGroups": [ "RESIDENTIAL" ], "apifyProxyCountry": "ES" }, "startUrl": [ { "url": "https://www.idealista.com/venta-viviendas/madrid/barrio-de-salamanca/castellana/con-precio-hasta_500000,con-solo-pisos,apartamentos,aticos,de-dos-dormitorios,de-tres-dormitorios,de-cuatro-cinco-habitaciones-o-mas,ascensor,ultimas-plantas,plantas-intermedias/" } ] }
But no result obtained:
2022-11-03T15:21:50.868Z INFO Starting the crawl. 2022-11-03T15:21:50.961Z INFO CheerioCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.7,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}} 2022-11-03T15:21:52.534Z INFO CheerioCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down. 2022-11-03T15:21:52.771Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":0,"retryHistogram":[],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":1903} 2022-11-03T15:21:52.772Z INFO Crawl finished.
Hello, I tryed to put this url to the browser and the idealista site give me zero results so no results from actor are okay.
That´s true. Sorry for the invalid example.
Here is one JSON with a URL that returns 35 elements but the crawler doesn´t retrive results.
{ "maxItems": 33, "proxy": { "useApifyProxy": true, "apifyProxyGroups": [ "RESIDENTIAL" ], "apifyProxyCountry": "ES" }, "startUrl": [ { "url": "https://www.idealista.com/venta-viviendas/madrid/barrio-de-salamanca/castellana/con-precio-hasta_500000/" } ] }
Additionally the actor can´t browse into paginated results, does it? I mean to scrape 10 pages shoul I provide the actor a 10 URLs array?
Hello, I'm sorry it was my typo in code. Now it is working. Here is example run: https://console.apify.com/view/runs/QaZsf3saQRDpql0SH
Let me know if you want to add some more features or if you have any ideas for upgrades. Thanks
Example:
run_input = { "district": "Fuencarral, Madrid",
"maxItems": 3,
}
Obtained: "apify_client._errors.ApifyApiError: Input is not valid: Items in input.startUrl at positions [0] do not contain valid URLs"