kotartemiy / newscatcher

Programmatically collect normalized news from (almost) any website.
https://newscatcherapi.com/
MIT License
2.94k stars 284 forks source link

RSS feeds #3

Closed vgoklani closed 4 years ago

vgoklani commented 4 years ago

Would it be possible to store the URLs in a compressed flat file? These SQL things are just convoluted and hard to work with. I would like to just open the list of feeds and see what you have, but it takes like 20 steps to get inside...

aeros281 commented 4 years ago

At least please provide the data of SQL in text format so it can properly tracked in git.

kotartemiy commented 4 years ago
  1. Download the SQLite file
  2. Load it as a pandas dataframe
    import pandas as pd
    import sqlite3
    db = sqlite3.connect(PATH_TO_DB_FILE, isolation_level=None)
    newscatcher_package_df_from_sql = pd.read_sql('select * from rss_table', db)
  3. Save it in a format you want (csv, txt, excel, etc). to_csv example
phiresky commented 4 years ago

These SQL things are just convoluted and hard to work with. I would like to just open the list of feeds and see what you have, but it takes like 20 steps to get inside...

Just use SQliteBrowser it as json / csv export options.

Alternatively, here's a 15 line script that converts the db to json in your browser:

https://gistcdn.githack.com/phiresky/a6a514b8392d500e46186f1bbe570e8d/raw/f330591705478e4d11c5674c5c92bb64cdf5a2a6/newscatcher-json.html

source