medialab / hyphe

Websites crawler with built-in exploration and control web interface
http://hyphe.medialab.sciences-po.fr/demo/
GNU Affero General Public License v3.0
329 stars 59 forks source link

Allow Hyphe to crawl and extract links from Google Docs/Google Spreadsheets #389

Closed ladelentes closed 3 years ago

ladelentes commented 4 years ago

For example, from a spreadsheet such as this one https://docs.google.com/spreadsheets/d/1_fdu4kO1axEOkqJ9nVtwTAiIn4fSdX-HBkrCBwjKfIQ/edit#gid=0

boogheta commented 4 years ago

You can export the spreadsheet as a csv, then load it within Hyphe's IMPORT menu which would be the best way to do so. If you'd like for such a page to be crawled this can only be adressed by the javascript crawler which is in development (but in standby at the moment) and will be way slower, cf https://github.com/medialab/hyphe/pull/288

ladelentes commented 4 years ago

I was thinking about this working for the crawler for the case when a spreadsheet is embedded in a webpage, such as http://feminicidiouruguay.net/otros-sitios But I get this is in development so I'll try a workaround. Thank you!