opensanctions / crawler-planning

Task tracking for the crawlers we're working on
https://github.com/orgs/opensanctions/projects/2
5 stars 0 forks source link

FCC Covered List #163

Closed pudo closed 2 months ago

pudo commented 2 months ago

Data URL

https://www.fcc.gov/supplychain/coveredlist

Publisher

FCC

Publisher country/territory code

US

Type of data

Debarment (Entities banned from participating in public contracting)

Coverage region

region:Global

Can you tell us more?

We'll need to do a bit of a manual parse on this, I reckon? Or just pick out the bold text? Should be a quick hit.

This is a suggestion or request

jbothma commented 2 months ago

I think seeing that there's a subsidiary in there in free text, let's just do a manual extraction https://docs.google.com/spreadsheets/d/1ElYqFpWcG_OclTkyR28mpaRN_Z05TMM6M5-Tn0u3-U0/edit#gid=0

I'd suggest making the subsidiary names semicolon-delimited

Also alert us to changes in the body automatically using https://zavod.opensanctions.org/helpers/#zavod.helpers.assert_dom_hash

I think it'd be great to quote those two footnotes in the description.

jbothma commented 2 months ago

CSV link https://docs.google.com/spreadsheets/d/e/2PACX-1vQCtBj1fAWXKlV5yhN38V66umTej12IlkQGzXGWC5LR7RCEPHlLMaBwqWpey6oHkShbyYYgRGm_0AbO/pub?output=csv

bgmello commented 2 months ago

Do we have to do it manually? I think we can just filter the tags

jbothma commented 2 months ago

It'd be really nice to have the subsidiary relationship in there. It feels like by the time we have some robust logic to programmatically say Pacific Networks Corp owns ComNet (USA) LLC in a way that might apply to the next one or two instances, we could have had it in production already

bgmello commented 2 months ago