skillachie / news-corpus-builder

Automatic News Corpus Builder
MIT License
40 stars 20 forks source link

Export to csv files instead of database #2

Closed ghost closed 7 years ago

ghost commented 8 years ago

Is there a way to export the tables as CSV files instead of database? Are you planning to add more news sites like yahoo news, bing news, reuters etc

skillachie commented 8 years ago

Is there a way to export the tables as CSV files instead of database? Right now the contents for each article can be saved as files in a local directory.

Instead of ex = NewsCorpusGenerator(corpus_dir,'sqlite') . You would specify as ex = NewsCorpusGenerator(corpus_dir)

We can add a method to just export the results in the database to CSV but you can already do that independent of the module since you have access to the SQLite file. You can also use http://sqlitebrowser.org/

Are you planning to add more news sites like yahoo news, bing news, reuters etc I started with that in mind. The plan was to add a variety of news sources but at the time I had to focus on the main project which used the generated corpus.

Bing News.

https://datamarket.azure.com/dataset/5BA839F1-12CE-4CCE-BF57-A49D98D29A44

Yahoo News

Most of Yahoo search APIs are now discontinued and we will need to figure out the pagination behind retrieving additional results

https://developer.yahoo.com/search/news/V1/newsSearch.html https://developer.yahoo.com/boss/search/

Other Sources

We can add any other sources once it can be done programmatically

I did not push to integrate them on the current release for the above reasons. However , I would love to see additional sources added and I am open to a pull request if you are interested.

umesharya1973 commented 7 years ago

Dear sir, I am not much familiar with Python but know how to open the Jupyter notebook. Can u pls tell me how to download ur code and run in jupyter or any text editor (or suggest any video). I am a corpus research and totally new to programming and finding myself helpless. Thanks in advance.

ghost commented 7 years ago

Hello umesh, I am not the programmer for this project and also this is not the right place to ask such trivial question. Without any basic understanding of programming it would be very difficult for you to use this module. I would recommend you to learn the basics of programming. This site https://www.learnpython.org/ is a best place to learn basics of Python programming, they also have android apps https://play.google.com/store/apps/details?id=com.sololearn.python

Hope this helped