rgarner / cma-tna-crawlers

Scraping old cases from TNA for CMA, no TLAs.
0 stars 3 forks source link

Mergers crawler: no case for ASSET #20

Closed rgarner closed 9 years ago

rgarner commented 9 years ago

See if we can work out why we're dropping 9 assets, and how they're referred/why they're not being associated with cases:

WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/AI-accepting-undertakings.pdf
WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/Ambassador-Theatre.pdf
WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/Arriva-Northumbria.pdf
WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/Co-op-PSW.pdf
WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/CGL-GA-Taylor.pdf
WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/Go-North-East.pdf
WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/LSE-Turquoise.pdf
WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/Kopper-Cindus.pdf
WARNING: no case for ASSET http://www.oft.gov.uk/shared_oft/mergers_ea02/2010/Teacrate.pdf
rgarner commented 9 years ago

Dumping the referer chain for these cases reveals that they're not really cases; they're decisions for 2009 cases. This is ok, but it does mean this bug is now about leaving 9 merger cases in the case store that aren't cases, just decisions. We need to ignore these during crawl.