stanford-oval / storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
http://storm.genie.stanford.edu
MIT License
10.14k stars 963 forks source link

bing search error. #133

Open MyraBaba opened 1 month ago

MyraBaba commented 1 month ago

Hi,

Bing search cant generate report and many 403 error even there is working site.

python examples/run_storm_wiki_gpt.py \ --output-dir ../storm/frontend/demo_light/DEMO_WORKING_DIR \ --retriever bing \ --do-research \ --do-generate-outline \ --do-generate-article \ --do-polish-article --search-top-k 200 --retrieve-top-k 200 Topic: how is the war stress and political situation in Middle East now ? root : ERROR : Error occurs when searching query current political situation in Middle East: 'webPages' root : ERROR : Error occurs when searching query recent conflicts in Middle East: 'webPages' Error while requesting URL('https://www.nytimes.com/live/2024/08/03/world/israel-hamas-iran-hezbollah-gaza') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/08/03/world/israel-hamas-iran-hezbollah-gaza'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/08/05/world/middleeast/israel-hamas-iran-retaliation.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/05/world/middleeast/israel-hamas-iran-retaliation.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/08/02/world/israel-hamas-iran-hezbollah-gaza') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/08/02/world/israel-hamas-iran-hezbollah-gaza'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/05/23/world/israel-gaza-war-hamas') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/05/23/world/israel-gaza-war-hamas'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/us-personnel-hurt-attack-against-base-iraq-officials-say-2024-08-05/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/us-personnel-hurt-attack-against-base-iraq-officials-say-2024-08-05/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/live/2024/02/20/world/israel-hamas-war-gaza-news') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/02/20/world/israel-hamas-war-gaza-news'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/05/06/world/israel-gaza-war-hamas') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/05/06/world/israel-gaza-war-hamas'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/03/08/world/israel-hamas-war-gaza-news') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/03/08/world/israel-hamas-war-gaza-news'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/biden-voices-hope-iran-will-stand-down-is-uncertain-2024-08-03/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/biden-voices-hope-iran-will-stand-down-is-uncertain-2024-08-03/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/2024/08/05/world/middleeast/iraq-us-troops-iran-attack.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/05/world/middleeast/iraq-us-troops-iran-attack.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/08/06/world/middleeast/lebanon-hezbollah-israel.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/06/world/middleeast/lebanon-hezbollah-israel.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/08/05/world/middleeast/iran-israel-attack-strikes-why.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/05/world/middleeast/iran-israel-attack-strikes-why.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/us-expresses-concern-over-escalating-middle-east-conflict-risk-2024-07-31/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/us-expresses-concern-over-escalating-middle-east-conflict-risk-2024-07-31/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/2024/08/01/world/middleeast/middle-east-israel-iran-hezbollah.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/01/world/middleeast/middle-east-israel-iran-hezbollah.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/08/02/world/israel-hamas-iran-hezbollah-gaza') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/08/02/world/israel-hamas-iran-hezbollah-gaza'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/08/03/world/israel-hamas-iran-hezbollah-gaza') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/08/03/world/israel-hamas-iran-hezbollah-gaza'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/08/05/world/middleeast/iran-israel-attack-strikes-why.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/05/world/middleeast/iran-israel-attack-strikes-why.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/us-personnel-hurt-attack-against-base-iraq-officials-say-2024-08-05/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/us-personnel-hurt-attack-against-base-iraq-officials-say-2024-08-05/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.politico.com/news/2024/07/29/us-war-worries-middle-east-00171680') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.politico.com/news/2024/07/29/us-war-worries-middle-east-00171680'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/07/31/world/middleeast/iran-lebanon-israel-war-assassination.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/07/31/world/middleeast/iran-lebanon-israel-war-assassination.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.politico.com/newsletters/national-security-daily/2024/08/05/two-possible-scenarios-for-an-iran-attack-against-israel-00172660') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.politico.com/newsletters/national-security-daily/2024/08/05/two-possible-scenarios-for-an-iran-attack-against-israel-00172660'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/02/02/world/us-iran-strikes-middle-east-news') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/02/02/world/us-iran-strikes-middle-east-news'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://thehill.com/newsletters/defense-national-security/4812642-us-seeks-to-limit-chances-of-larger-middle-east-war/') - HTTPStatusError("Client error '403 Forbidden' for url 'https://thehill.com/newsletters/defense-national-security/4812642-us-seeks-to-limit-chances-of-larger-middle-east-war/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/05/06/world/israel-gaza-war-hamas') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/05/06/world/israel-gaza-war-hamas'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/04/29/world/israel-gaza-war-hamas') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/04/29/world/israel-gaza-war-hamas'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.tandfonline.com/doi/full/10.1080/19448953.2021.1888251') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.tandfonline.com/doi/full/10.1080/19448953.2021.1888251'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.ohchr.org/en/statements/2024/08/un-human-rights-chief-risk-wider-conflict-middle-east') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.ohchr.org/en/statements/2024/08/un-human-rights-chief-risk-wider-conflict-middle-east'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/08/06/world/middleeast/lebanon-hezbollah-israel.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/06/world/middleeast/lebanon-hezbollah-israel.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/08/03/world/israel-hamas-iran-hezbollah-gaza') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/08/03/world/israel-hamas-iran-hezbollah-gaza'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/is-hezbollah-israel-conflict-about-spiral-2024-07-28/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/is-hezbollah-israel-conflict-about-spiral-2024-07-28/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/live/2024/08/02/world/israel-hamas-iran-hezbollah-gaza') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/08/02/world/israel-hamas-iran-hezbollah-gaza'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/2024/07/31/world/middleeast/iran-lebanon-israel-war-assassination.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/07/31/world/middleeast/iran-lebanon-israel-war-assassination.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.politico.com/news/middle-east') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.politico.com/news/middle-east'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/g7-nations-urge-de-escalation-middle-east-amid-threat-broader-conflict-2024-08-05/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/g7-nations-urge-de-escalation-middle-east-amid-threat-broader-conflict-2024-08-05/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/2023/06/13/world/middleeast/egypt-opposition-talks.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2023/06/13/world/middleeast/egypt-opposition-talks.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.washingtonpost.com/world/2024/04/12/israel-hamas-war-news-gaza-palestine/') - ReadTimeout('The read operation timed out') Error while requesting URL('https://www.reuters.com/world/middle-east/dont-bomb-beirut-us-leads-push-rein-israels-response-2024-07-29/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/dont-bomb-beirut-us-leads-push-rein-israels-response-2024-07-29/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/live/2024/03/13/world/israel-hamas-war-gaza-news') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/03/13/world/israel-hamas-war-gaza-news'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/03/20/world/israel-hamas-war-gaza-news') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/03/20/world/israel-hamas-war-gaza-news'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.arabnews.com/middleeast') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.arabnews.com/middleeast'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://onlinelibrary.wiley.com/doi/full/10.1111/1758-5899.12829') - HTTPStatusError("Client error '403 Forbidden' for url 'https://onlinelibrary.wiley.com/doi/full/10.1111/1758-5899.12829'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.wsj.com/world/middle-east') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.wsj.com/world/middle-east'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://academic.oup.com/ia/article-abstract/98/2/689/6530475') - HTTPStatusError("Client error '403 Forbidden' for url 'https://academic.oup.com/ia/article-abstract/98/2/689/6530475'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.washingtonpost.com/world/2024/08/02/haniyeh-israel-ceasefire-middle-east/') - ReadTimeout('The read operation timed out') Error while requesting URL('https://www.reuters.com/world/middle-east/middle-eastern-stocks-slump-us-recession-fears-regional-tensions-2024-08-05/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/middle-eastern-stocks-slump-us-recession-fears-regional-tensions-2024-08-05/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/2024/08/01/world/middleeast/middle-east-israel-iran-hezbollah.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/01/world/middleeast/middle-east-israel-iran-hezbollah.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/israel-palestinian-dispute-hinges-statehood-land-jerusalem-refugees-2023-10-10/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/israel-palestinian-dispute-hinges-statehood-land-jerusalem-refugees-2023-10-10/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.reuters.com/world/middle-east/pentagon-tells-israel-it-will-adjust-us-troops-middle-east-2024-08-02/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/pentagon-tells-israel-it-will-adjust-us-troops-middle-east-2024-08-02/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.wsj.com/world/middle-east/a-guide-to-the-middle-easts-growing-conflicts-in-six-maps-2ea0c0da') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.wsj.com/world/middle-east/a-guide-to-the-middle-easts-growing-conflicts-in-six-maps-2ea0c0da'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://crsreports.congress.gov/product/pdf/IF/IF11726/1') - HTTPStatusError("Client error '403 Forbidden' for url 'https://crsreports.congress.gov/product/pdf/IF/IF11726/1'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2021/05/12/world/middleeast/israeli-palestinian-conflict-gaza-hamas.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2021/05/12/world/middleeast/israeli-palestinian-conflict-gaza-hamas.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/hamas-chief-ismail-haniyeh-killed-iran-hamas-says-statement-2024-07-31/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/hamas-chief-ismail-haniyeh-killed-iran-hamas-says-statement-2024-07-31/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/2024/07/30/world/middleeast/us-iran-iraq.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/07/30/world/middleeast/us-iran-iraq.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://news.un.org/en/story/2023/12/1145182') - ReadTimeout('The read operation timed out') Error while requesting URL('https://www.washingtonpost.com/world/middle-east/') - ReadTimeout('The read operation timed out') Error while requesting URL('https://www.nytimes.com/live/2024/08/02/world/israel-hamas-iran-hezbollah-gaza') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/08/02/world/israel-hamas-iran-hezbollah-gaza'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/07/30/world/middleeast/us-iran-iraq.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/07/30/world/middleeast/us-iran-iraq.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/live/2024/08/03/world/israel-hamas-iran-hezbollah-gaza') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/08/03/world/israel-hamas-iran-hezbollah-gaza'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.wsj.com/world/middle-east/iran-warns-pilots-to-avoid-airspace-as-middle-east-awaits-attack-0682f78e') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.wsj.com/world/middle-east/iran-warns-pilots-to-avoid-airspace-as-middle-east-awaits-attack-0682f78e'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.usnews.com/news/world/articles/2024-07-31/us-expresses-concern-over-escalating-middle-east-conflict-risk') - ReadTimeout('The read operation timed out') Error while requesting URL('https://www.nytimes.com/2024/08/05/world/middleeast/iran-israel-attack-strikes-why.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/05/world/middleeast/iran-israel-attack-strikes-why.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/08/01/world/middleeast/middle-east-israel-iran-hezbollah.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/08/01/world/middleeast/middle-east-israel-iran-hezbollah.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.tandfonline.com/doi/full/10.1080/19448953.2021.1888251') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.tandfonline.com/doi/full/10.1080/19448953.2021.1888251'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.fpri.org/article/2024/03/the-realignment-of-the-middle-east/') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.fpri.org/article/2024/03/the-realignment-of-the-middle-east/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.nytimes.com/2024/07/31/world/middleeast/iran-lebanon-israel-war-assassination.html') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/2024/07/31/world/middleeast/iran-lebanon-israel-war-assassination.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/killing-hamas-leader-intended-prolong-gaza-conflict-abbas-tells-ria-news-agency-2024-08-05/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/killing-hamas-leader-intended-prolong-gaza-conflict-abbas-tells-ria-news-agency-2024-08-05/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.reuters.com/world/middle-east/hamas-chief-ismail-haniyeh-killed-iran-hamas-says-statement-2024-07-31/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/hamas-chief-ismail-haniyeh-killed-iran-hamas-says-statement-2024-07-31/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.nytimes.com/live/2024/02/02/world/us-iran-strikes-middle-east-news') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.nytimes.com/live/2024/02/02/world/us-iran-strikes-middle-east-news'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.reuters.com/world/middle-east/israel-palestinian-dispute-hinges-statehood-land-jerusalem-refugees-2023-10-10/') - HTTPStatusError("Client error '401 HTTP Forbidden' for url 'https://www.reuters.com/world/middle-east/israel-palestinian-dispute-hinges-statehood-land-jerusalem-refugees-2023-10-10/'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401") Error while requesting URL('https://www.chathamhouse.org/2024/05/beware-middle-easts-forgotten-wars') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.chathamhouse.org/2024/05/beware-middle-easts-forgotten-wars'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403") Error while requesting URL('https://www.wsj.com/world/middle-east/a-guide-to-the-middle-easts-growing-conflicts-in-six-maps-2ea0c0da') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.wsj.com/world/middle-east/a-guide-to-the-middle-easts-growing-conflicts-in-six-maps-2ea0c0da'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")

_run_conversation conv = future.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py", line 259, in run_conv return conv_simulator( File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/dspy/primitives/program.py", line 26, in call return self.forward(*args, *kwargs) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py", line 55, in forward expert_output = self.topic_expert(topic=topic, question=user_utterance, ground_truth_url=ground_truth_url) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/dspy/primitives/program.py", line 26, in call return self.forward(args, kwargs) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py", line 174, in forward searched_results: List[StormInformation] = self.retriever.retrieve(list(set(queries)), File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/retriever.py", line 244, in retrieve retrieved_data_list = self._rm(query_or_queries=query, exclude_urls=exclude_urls) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/dspy/retrieve/retrieve.py", line 30, in call return self.forward(*args, **kwargs) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/knowledge_storm/rm.py", line 158, in forward valid_url_to_snippets = self.webpage_helper.urls_to_snippets(list(url_to_results.keys())) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/knowledge_storm/utils.py", line 405, in urls_to_snippets articles = self.urls_to_articles(urls) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/knowledge_storm/utils.py", line 393, in urls_to_articles article_text = extract( File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/trafilatura/core.py", line 322, in extract options = Extractor( File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/trafilatura/settings.py", line 86, in init self._set_format(output_format) File "/home/bc/Projects/ODS/stormv2/venvStormv2/lib/python3.10/site-packages/trafilatura/settings.py", line 112, in _set_format raise AttributeError(f"Cannot set format, must be one of: {', '.join(sorted(SUPPORTED_FORMATS))}") AttributeError: Cannot set format, must be one of: csv, html, json, markdown, python, txt, xml, xmltei

Yucheng-Jiang commented 1 month ago

Seems like it's not the issue with bing search, as bing search only returns top ranked URLs. The web page extraction (knowledge_storm/utils.py, urls_to_articles function) handles content parsing. Most likely due to network issues. Maybe try another network / WiFI, configure VPN if needed.

ColtonBehannon commented 1 month ago

I'm experiencing the same thing. Setting retriever to You works but Bing throws many 403s.

Yucheng-Jiang commented 2 weeks ago

You.com api directly return web page content, while bing search returns urls followed by web scrapping.

Will close this issue if no further question is posted by end of this week.