Open vbalbp opened 5 years ago
The spider is working just fine, both the normal and the single spiders. The tests are failing though because the new adaption completely breaks what was there. Apart from that, functional cds and arxiv fail because of the removal of
# Allow duplicate requests
DUPEFILTER_CLASS = "scrapy.dupefilters.BaseDupeFilter"
However, since we harvest the proceedings page as well as the paper, we get the proceedings multiple times in one run, since it gets it once per each record, even if it's the same proceedings for every record (That is the usual case when harvesting by sets, since sets are conferences). By removing that line, we get the proceedings record only once instead of multiple times.
Signed-off-by: Victor Balbuena vbalbp@gmail.com