Closed FeLoe closed 6 years ago
It seems we cannot reproduce the error, seems to work fine? @FeLoe , is this still an issue on your system? If so, can you post more info on the bug, and otherwise close the issue? Thanks!!
Also works on the server and in @mariekevh 's virtual box
hm.. whenever I run the scraper (myinca.rssscrapers.volkskrant(save = True)) I get this:
ValueError Traceback (most recent call last)
<ipython-input-5-08f10522bb1c> in <module>()
----> 1 myinca.rssscrapers.volkskrant(save = True)
~/inca_test/inca/inca/__main__.py in endpoint(*args, **kwargs)
255 else:
256 def endpoint(*args, **kwargs):
--> 257 return method(*args, **kwargs)
258 return endpoint
259
~/inca_test/inca/inca/core/document_class.py in runwrap(self, action, *args, **kwargs)
37 '''
38 if action == 'run':
---> 39 return self.run(*args, **kwargs)
40
41 if action == 'delay':
~/inca_test/inca/inca/core/scraper_class.py in run(self, save, *args, **kwargs)
74 logger.info("Started scraping")
75 if save == True:
---> 76 for doc in self.get(save, *args, **kwargs):
77 if type(doc)==dict:
78 doc = self._add_metadata(doc)
~/inca_test/inca/inca/scrapers/rss_scraper.py in get(self, save, **kwargs)
86 # do not want to look something up in the database. We therefore also retrieve it in
87 # that case.
---> 88 if save==False or check_exists(_id)[0]==False:
89 try:
90 req=urllib2.Request(link, headers={'User-Agent' : "Wget/1.9"})
~/inca_test/inca/inca/core/database.py in check_exists(document_id)
61 index = elastic_index
62 try:
---> 63 retrieved = client.get(elastic_index,doc_type='_all', id=document_id)
64 logger.debug('elastic_index {index} - document [{document_id}] found, return document'.format(**locals()))
65 return True, retrieved
/usr/local/lib/python3.5/dist-packages/elasticsearch/client/utils.py in _wrapped(*args, **kwargs)
74 if p in kwargs:
75 params[p] = kwargs.pop(p)
---> 76 return func(*args, params=params, **kwargs)
77 return _wrapped
78 return _wrapper
/usr/local/lib/python3.5/dist-packages/elasticsearch/client/__init__.py in get(self, index, doc_type, id, params)
407 for param in (index, doc_type, id):
408 if param in SKIP_IN_PATH:
--> 409 raise ValueError("Empty value passed for a required argument.")
410 return self.transport.perform_request('GET', _make_path(index,
411 doc_type, id), params=params)
ValueError: Empty value passed for a required argument.
Never happens with other scrapers - or am I doing anything wrong?
We tried it with save=False, Could that be why it works? (I don't have my laptop with me right now. Can't test it.)
Well, with save = False it always worked for me ;) The traceback also shows some elastic search issues with it (which I don't get because they should not be specific to the Volkskrant scraper?)
Running the volkskrantscraper throws an error, needs to be fixed.