Closed jvanz closed 3 years ago
Further comments....
By the way, there is an on going discussion about topic 2
Hey guys, what is the status of this issue? Need any help with data vis? Would love to contribute!
I'm configuring a production server. It's almost done and I'm running the spiders to test if it's working. To version 0.9, our first goal is to have a simple UI allowing user to download gazettes from any city from one single web page and make the access to the data as easy as possible. But I`m not working on that now. I was considering if the Digital Ocean spaces does not have some kind of simple interface already. I need to check that. For the next version, I'm interested creating an API to filter the documents you're interested to. But first we need to have these files somewhere...
We do not have any idea about some more complex data visualization tool. But any suggestion is welcome. I've open a very naive PR some time ago creating a very simple data processing pipeline with a graph databse. But nothing which should run in production.
@arturmesquitab I think you can start to think about the UI to show the documents for the users. What do you think? It's just a suggestion. You can work on whatever you want. ;)
If you accept my sugestion we can discuss this in the issue #161
@jvanz what about scrapinghub as a partner to run the spiders? We know a guy there :) but not sure if they would sponsor something.
Not fetching files already downloaded or even requesting pages that we already saw is doable on Scrapy.
What needs to be automated?
What kind of UX wizard are you looking for? UX is not the same as UI :thinking: and I can do some tricks in both
@jvanz what about scrapinghub as a partner to run the spiders? We know a guy there :) but not sure if they would sponsor something.
Yes, we are working on that.
Not fetching files already downloaded or even requesting pages that we already saw is doable on Scrapy.
Yes, there is an ongoing discussion. See #247
What needs to be automated?
Right now, I'm considering automate the deploy. There are simple scripts that I've used to do that during my tests. I do not consider them production ready. But it save me a lot of time.
What kind of UX wizard are you looking for? UX is not the same as UI thinking and I can do some tricks in both
Nothing to complicated. I do not spend time on that. I'm focus on the API now. But I know that OKBR (@sergiomario) is thinking about that.
@jvanz do you think we could close this? Some cool stuff that is not implemented yet were being discussed here but I think the main idea is done :)
@jvanz do you think we could close this? Some cool stuff that is not implemented yet were being discussed here but I think the main idea is done :)
Sure. Closing now...
During the past days I've being discussing with @sergiomario about run the spiders in production and make the scraped files available in a central web page. The first version does not need to be to fancy. The idea is to run the spider in a server/cluster, store the files and build a simple web page allowing the user to search and read the scraped files.
As the Serenata de Amor already run in the Digital Ocean, I think we can continue in the same provider. All we need in the first version will be a server/k8s cluster, PostgreSQL and a file storage. We can address all these needs with the DO products available.
To achieve this goal we see the follow issues need to be addressed:
@sergiomario, am I forgetting something?