Open salus-sage opened 7 years ago
Structure added to the sheet.
These are sources with RSS, why shud they go on the sheet?
@salus-sage True. Economics Times provide ~156 link on this page: http://economictimes.indiatimes.com/rss.cms
Dealing with huge url list, I would rather prefer script than manually crawling(to verify, manually it took ~2.15hr in just crawling/copy-pasting/formatting link in sources_v3 for ET's 156 url, Which I feel is not required anymore!)
On top of that, my understanding is that if we have a script doing all the crawling/generating, why not update the script instead sources directly. This is anyway more reliable method, individual pasted url may change in future and sources file may get stale.
What do you think?
But the crawlers are to generate RSS. Not to just scrape links. For that u remember V used a online service? I don't want more links in crawler list until we crack the regex.
On 10 Feb 2017 10:18 a.m., "Khushpreet" notifications@github.com wrote:
@salus-sage https://github.com/salus-sage True. Economics Times provide ~156 link on this page: http://economictimes.indiatimes.com/rss.
Dealing with huge url list, I would rather prefer script than manually crawling(to verify, manually it took ~2.15hr in just crawling/copy-pasting/formatting link in sources_v3 for ET's 156 url, Which I feel is not required anymore!)
My understanding is that if we have a script doing all the crawling/generating, why not update the script instead sources directly. This is anyway more reliable method, individual pasted url may change in future. What do you think?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/janastu/IIHS-TNUSSP-feed/issues/16#issuecomment-278857007, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGlW09bb7G7_55zHCJ9wZif7KH2FEltks5ra-wwgaJpZM4L8LFL .
totally about 149 categories available, but configuration is partial on newsrack. can expect this source to show up in the missed articles category during testing
Valid rss urls can be found by view source in each category Test case: i'm configuring politics and nation to test it out http://economictimes.indiatimes.com/news/politics-and-nation/rssfeeds/1052732854.cms The key pattern missing in config is /category/sub-cat/rssfeeds/id.cms