holwech / NewsScraper

59 stars 36 forks source link

Scraping several sites at the same time #1

Open racindustries opened 6 years ago

racindustries commented 6 years ago

When running the code only the first news website entered in the json list seems to be downloaded and parsed. Do you have any suggestion ?

ghost commented 6 years ago

@racindustries same with me. And I've looked around to see if anyone has a solution. Haven't found any. Were you able to work around this issue?

Susmithap3 commented 6 years ago

same i also need help???

ivanovishado commented 6 years ago

Can any of you please share the code that you're using? I used the code of this repo and worked fine with the JSON list I provided to it.

ghost commented 6 years ago

It's been a while from my end, but i used the exact same code from holwech only changed the news sites.

On Mon, Oct 22, 2018 at 9:40 AM Iván Galaviz notifications@github.com wrote:

Can any of you please share the code that you're using? I used the code of this repo and worked fine with the JSON list I provided to it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/holwech/NewsScraper/issues/1#issuecomment-431751817, or mute the thread https://github.com/notifications/unsubscribe-auth/AlG3evwRtK9o_L2zmixOLc9R8RN5RJZzks5unWhIgaJpZM4Vi9hr .

ivanovishado commented 6 years ago

@Civmwa can you please share the JSON list you used to see if I can reproduce the error?

ghost commented 6 years ago

@ivanovishado { "The Standard": { "link": "https://www.standardmedia.co.ke/business" }, "bbc": { "rss": "http://feeds.bbci.co.uk/news/rss.xml", "link": "http://www.bbc.com/" }, "theguardian": { "rss": "https://www.theguardian.com/uk/rss", "link": "https://www.theguardian.com/international" }, "breitbart": { "link": "http://www.breitbart.com/" }, "infowars": { "link": "https://www.infowars.com/" }, "foxnews": { "link": "http://www.foxnews.com/" }, "nbcnews": { "link": "http://www.nbcnews.com/" }, "washingtonpost": { "rss": "http://feeds.washingtonpost.com/rss/world", "link": "https://www.washingtonpost.com/" } }

ivanovishado commented 6 years ago

@Civmwa NewsScraper.py worked fine for me, here's the output file as proof.

Tested it in Windows 10, Python 3.6.2

ghost commented 6 years ago

Hi Ivan - Not entirely sure what happened between when i sent it to you and now, but i ran it and it works. LOL. One small issue though, how would i get to print a summary of the article?

On Thu, Oct 25, 2018 at 7:22 AM Iván Galaviz notifications@github.com wrote:

@Civmwa https://github.com/Civmwa NewsScraper.py worked fine for me, here's the output file https://pastebin.com/ndLPb7QL as proof.

Tested it in Windows 10, Python 3.6.2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/holwech/NewsScraper/issues/1#issuecomment-432909856, or mute the thread https://github.com/notifications/unsubscribe-auth/AlG3etznaAb8qfydiRykeT8q-zZc6P27ks5uoTyHgaJpZM4Vi9hr .

ivanovishado commented 6 years ago

Not entirely sure what happened between when i sent it to you and now, but i ran it and it works. LOL.

@Civmwa lol

how would i get to print a summary of the article?

You need to add content.nlp() just after content.parse() then you would call content.summary. Keep in mind that nlp() adds some processing time and the summary won't be perfect.

ghost commented 6 years ago

Thanks Ivan. Much appreciated

On Fri, Oct 26, 2018 at 6:21 AM Iván Galaviz notifications@github.com wrote:

Not entirely sure what happened between when i sent it to you and now, but i ran it and it works. LOL.

@Civmwa https://github.com/Civmwa lol

how would i get to print a summary of the article?

You need to add content.nlp() just after content.parse() then you would call content.summary. Keep in mind that nlp() adds some processing time and the summary won't be perfect.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/holwech/NewsScraper/issues/1#issuecomment-433273809, or mute the thread https://github.com/notifications/unsubscribe-auth/AlG3ejf3IiyYn8ghMi3lXYVRf3nHneZFks5uon-ogaJpZM4Vi9hr .

ivanovishado commented 6 years ago

@Civmwa You're welcome. I believe this issue can be closed now, @racindustries.

ghost commented 6 years ago

Yes.

On Mon, Oct 29, 2018 at 4:46 AM Iván Galaviz notifications@github.com wrote:

@Civmwa https://github.com/Civmwa You're welcome. I believe this issue can be closed now, @racindustries https://github.com/racindustries.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/holwech/NewsScraper/issues/1#issuecomment-433764553, or mute the thread https://github.com/notifications/unsubscribe-auth/AlG3eutQdj1alQiwLXzLmkJUsIkRHN3nks5upl4NgaJpZM4Vi9hr .