Closed hubitor closed 5 years ago
Can you post a details question? May be this script will help you :https://github.com/howie6879/ruia/blob/master/examples/topics_examples/hacker_news_spider.py
If you want to save the results in mongodb there is a example:
click here
I've never used asyncio or any of the asynchronous libraries in python before and from what I've read, the normal libraries cannot be used with asyncio, that's why I'm asking. I don't have some code yet I could post. I'm just investigating the options.
Can you post a details question? May be this script will help you
What do you mean by this? Would it be possible to put some thousand URLs in the start_urls list? Would it be possible or make sense to combine Celery with ruia?
So there is currently support only for mongodb?
If there are too many links crawled, I suggest putting them in the parse function for crawling.
Motor_ruia is a plugin that I wrote, for other databases, just use a supported third-party library,for example, mysql can consider using aiomysql,more lib you can click here
Shouldn't it be up to the user whether it makes sense to combine Celery with Ruia? do you think it makes sense to have the combination of celery and Asyncio?
If you are interested in asynchronous, I think asyncio is a good choice.
OK, Thanks. I'll look into it.
What are the possible options for scraping multiple websites e.g. through a list or a file and saving the results in a database?