Closed glemiere closed 7 years ago
If all of your ids a sequential then we already have you covered.
You can create what we call a generated url
:
To start creating a generated url fist you need to press the down arrow beside the START URLS
section.
When the dropdown opens you click on Add generation url
This will bring you to the create generation url window.
This window allows you to build up a url from different pieces.
Here is an example of a generated url that will start with all listing pages on the site using a range.
If a generated url doesn't work for you why not try a feed url
.
You can start creating a feed url
in the same way that you would make a generated url
.
You press the down arrow beside the START URLS
section and select Add feed url
This will open the create feed url window.
All you need to do is to provide a link to a file containing urls structured like this and whenever your spider starts it will download the file and start with the urls that it contains. This means that you can update the start urls for your spider without modifying it.
Hopefully this helps. As a rule if you need more than 10 start urls you should use generated and feed urls instead.
Interesting, my interface doesn't look like that. I probably didn't install it correctly.
@glemiere That is the previous Portia version. For a more up to date version, check out the develop branch
Here was my mistake! Awesome thank you ;-)
Hi! I really Love Portia, this software as a very good potentials! However, a little detail mess everything up.
When you add some URLs, each url generate one button. But creating 200k buttons in one time is just destroying the video memory and crash the browser, and as you guess, being able to crawl more than 10 pages is clearly vital. This modification would be pretty fast to make : just don't append any button.
If you could also add a very quick system to allow us to parameter urls. Like being able to say : id from 1 to 9999999 instead of giving a huge list of almost identical links. But fix that browser crashing problem first ;)
Thank you!