rgarner / cma-tna-crawlers

Scraping old cases from TNA for CMA, no TLAs.
0 stars 3 forks source link

Some newer merger cases have no body copy *or* PDF #29

Closed rgarner closed 9 years ago

rgarner commented 9 years ago

EDIT: Have discovered a section in an email trail that states:

For 'found not to qualify' there is no PDF. So we will just need the web text on that 2nd page (and the summary title if possible - as above)

These run counter to the stated convention that newer (2010-2014 cases) always have PDFs as a decision.

For example (not exhaustive):

http://webarchive.nationalarchives.gov.uk/20140402142426/http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/capita-northgatearinso http://webarchive.nationalarchives.gov.uk/20140402142426/http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/carefusion

In these cases, the generated JSON will only contain a summary based on the title of the case, and the body copy detail will be lost. Is this acceptable?

Example JSON:

{
  "title": "Carefusion /  Rowa Automatisierungssysteme GmbH",
  "original_url": "http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/carefusion",
  "case_type": "mergers",
  "original_urls": [
    "http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/carefusion"
  ],
  "summary": "OFT closed case: Completed acquisition by Carefusion Corporation of Rowa Automatisierungssysteme GmbH",
  "opened_date": null,
  "closed_date": "2011-12-10",
  "outcome_type": "mergers-phase-1-found-not-to-qualify",
  "market_sector": "healthcare-and-medical-equipment",
  "modified_by_sheet": true,
  "case_state": "closed"
}
rgarner commented 9 years ago

There are 66 of these:

http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/asda-harwich http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/asda2 http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/assa-abloy http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/edr http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/go-ahead-group http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/goahead-konectbus http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/Gurit http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/hotelplan http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/intercontinentalexchange http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/menswear http://www.oft.gov.uk/OFTwork/mergers/decisions/2010/scottish-midland http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/Asda_KingsLynn http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/Autobar http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/bsb http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/capita-group http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/capita-northgatearinso http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/capita http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/carefusion http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/Cavendish2 http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/Greencore http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/hypercom http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/INRIX http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/john-zink http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/lonza-group http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/sims-group http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/tesco-stores http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/vivendi http://www.oft.gov.uk/OFTwork/mergers/decisions/2011/walstead http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/accraply http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/Burlington http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/Christie http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/civica http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/experian-garlik http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/gauselmann http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/gb-oils http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/glassolutions http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/moneysupermarket http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/PHS2 http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/phs4 http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/Ryder http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/velti http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/viridor http://www.oft.gov.uk/OFTwork/mergers/decisions/2012/wartsila http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/airport-partners http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/arriva-midlands-north http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/avis http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/Capita http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/Costain http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/dfds http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/extra-msa http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/global-energy http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/IQE http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/mizkan http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/qep http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/sportech http://www.oft.gov.uk/OFTwork/mergers/decisions/2013/Turbomeca http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/alliance-medical http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/iress http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/marlowe-holdings http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/omnicell http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/pathology http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/Rentokil_Initial http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/Ridgeway http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/shire http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/StagecoachLimited http://www.oft.gov.uk/OFTwork/mergers/decisions/2014/Transitionsoptical

rgarner commented 9 years ago

Fixed in 77ca75d00a08c1bf317eb09c5ef7ee75fd33dbad