Closed rgarner closed 9 years ago
You say you did this for CC previously. Assume you mean for the open CC cases brought across in March '14? Don't suppose it's easy to point to an example open CC case you scraped?
Sorry @adammaddison, @davidmann, overlooked your question. Yes, the March '14 CC cases.
Yes, here's a mergers case, Anglo-American Lafarge that was consolidated from several URLs starting here. The left hand menu was crawled, and the constituent parts reassembled in an order requested by CMA.
In this case, I am currently reassembling these links in the order that they were encountered on the page. There are typically fewer (read: 1) sections, so this is an edge case for 25 cases in mergers:
http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2003/sibelco http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2003/stena http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2004/firstgroup2 http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2005/britannic http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2005/national http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2005/william-hill http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2006/aggregate http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2006/boots http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2006/coop2 http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2007/Co-op http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2007/Flybe http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2007/Inchcape http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2007/lloyds http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2007/Tesco http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2008/AirFrance http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2008/coop-somerfield http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2008/Diageo http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2008/Dunfermline http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2008/globalradio http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2008/Home http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2008/SRCL-Cliniserve http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2009/Aggregate http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2009/Co-operative2 http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2009/cooperative2 http://www.oft.gov.uk/OFTwork/mergers/Mergers_Cases/2009/sports-direct-inter
and similarly low number for markets:
http://www.oft.gov.uk/OFTwork/markets-work/references/aggregates-MIR http://www.oft.gov.uk/OFTwork/markets-work/references/airports http://www.oft.gov.uk/OFTwork/markets-work/references/bus-services http://www.oft.gov.uk/OFTwork/markets-work/references/classified-directory http://www.oft.gov.uk/OFTwork/markets-work/references/extended-warranties http://www.oft.gov.uk/OFTwork/markets-work/references/payment http://www.oft.gov.uk/OFTwork/markets-work/references/store-cards1
All markets and many old mergers cases need a body generator.
2002-2009 mergers cases were HTML- (not PDF-) based. While we collect that HTML, we don't do anything with it yet. We need to come up with a strategy to populate the body of the single document that each case will become, similar to what we did for CC previously.
Is it ok to note the order of the links in the case page, then generate a single body with appropriate headers for them later (example)? Should that order be most recent first, or as it was in the old site, in chronological order?