openva / crump

A parser for the Virginia State Corporation Commission's business registration records.
https://vabusinesses.org/
MIT License
20 stars 3 forks source link

Optionally convert industry codes to SIC & NAICS codes #103

Open waldoj opened 9 years ago

waldoj commented 9 years ago

It's nice that Virginia's doing their own thing, but some people might want standard codes. Map SCC's codes to SIC codes.

slott56 commented 9 years ago

http://www.naics.com/

You probably want both SIC as well as NAICS as more widely-used alternatives to the SCC Industry Codes.

Having inspected the data available from SCC, NAICS and SIC, it's an ugly problem. The NAICS-SIC bridge is documented at the most detailed level of each scheme. See http://www.naics.com/naicswp2014/wp-content/uploads/2014/10/NAICS-to-SIC-Crosswalk.pdf for details on the mapping.

The SCC IC codes are not detailed enough; they only match higher-levels of NAICS and SIC structures. The bad news is that the mapping must be done manually and I am not an economist; the good news is that there are only 44 codes to be mapped.

waldoj commented 9 years ago

Great! Thanks for that, @slott56. :)

slott56 commented 9 years ago

Something like this?

SCC Code SCC Decription SIC Code SIC Description
0 GENERAL 99 Non-Classifiable Establishments
11 ELECTRIC 49 Electric, Gas, & Sanitary Services
12 TELEPHONE 48 Communications
13 GAS 29 Petroleum & Coal Products
14 WATER 49 Electric, Gas, & Sanitary Services
15 WATER-SEWER 49 Electric, Gas, & Sanitary Services
16 SEWER 49 Electric, Gas, & Sanitary Services
18 RADIO COMMON CARRIER 48 Communications
20 BANKS AND CREDIT UNIONS 60 Depository Institutions
22 FEDERAL BANKS 60 Depository Institutions
23 BANK W/LIMITED CERTIFICATE 60 Depository Institutions
25 BANK & INSURANCE AGENT
26 MRTG/INSURANCE 63 Insurance Carriers
30 INSURANCE COMPANIES 63 Insurance Carriers
31 INSURANCE CONSULTANT 64 Insurance Agents, Brokers, & Service
35 INSURANCE AGENCIES 64 Insurance Agents, Brokers, & Service
36 MORTGAGE CO 61 Nondepository Institutions
40 OTHER FINANCIAL INSTITUTIONS 62 Security & Commodity Brokers
41 INSURANCE REGULATED ENTITIES 63 Insurance Carriers
42 INS COMPANY & INS AGENCY 64 Insurance Agents, Brokers, & Service
43 INS COMPANY & INS REG ENTITY 64 Insurance Agents, Brokers, & Service
50 OTHER PUBLIC SERVICE
51 RAILWAYS 40 Railroad Transportation
60 OTHER CHARITABLE INSTITUTIONS 83 Social Services
61 FOUNDATIONS 67 Holding & Other Investment Offices
62 VOLUNTEER RESCUE SQUADS/FIRE DEPTS 92 Justice, Public Order, & Safety
63 RELIGIOUS CEMETERIES 65 Real Estate
64 ORPHANAGES 83 Social Services
65 CHURCHES AND RELIGIOUS DENOMINATIONS 86 Membership Organizations
66 ANIMAL AND CHILDREN'S WELFARE
67 SPECIAL HOSPITALS AND COLLEGES 80 Health Services
68 HISTORICAL/LITERARY SOCIETIES 84 Museums, Botanical, Zoological Gardens
69 ARTS AND SCIENCES 82 Educational Services
70 OTHER PROFESSIONAL COMPANIES 87 Engineering & Management Services
71 DOCTORS 80 Health Services
72 LAWYERS/ATTORNEYS 81 Legal Services
73 ARCHITECTS 87 Engineering & Management Services
74 AUDIOLOGIST 80 Health Services
75 SPEECH PATHOLOGIST 80 Health Services
76 CLINICAL NURSE SPECIALIST 80 Health Services
77 HOUSING COOPERATIVES 87 Engineering & Management Services
80 AGRICULTURAL COOPERATIVES 7 Agricultural Services
81 OTHER COOPERATIVES 89 Services, Not Elsewhere Classified
95 BENEFIT CORPORATIONS 83 Social Services
waldoj commented 9 years ago

Ha! Yeah, maybe something a little like that. ;) Thank you, @slott56, it's really kind of you to take the time to put this together. You've done the hard work—now I have the easy task of integrating this list. I appreciate it!

slott56 commented 9 years ago

This may be more helpful.

- {scc_code: '00', scc_description: GENERAL, sic_code: '99', sic_description: Non-Classifiable
    Establishments}
- {scc_code: '11', scc_description: ELECTRIC, sic_code: '49', sic_description: 'Electric,
    Gas, & Sanitary Services'}
- {scc_code: '12', scc_description: TELEPHONE, sic_code: '48', sic_description: Communications}
- {scc_code: '13', scc_description: GAS, sic_code: '29', sic_description: Petroleum
    & Coal Products}
- {scc_code: '14', scc_description: WATER, sic_code: '49', sic_description: 'Electric,
    Gas, & Sanitary Services'}
- {scc_code: '15', scc_description: WATER-SEWER, sic_code: '49', sic_description: 'Electric,
    Gas, & Sanitary Services'}
- {scc_code: '16', scc_description: SEWER, sic_code: '49', sic_description: 'Electric,
    Gas, & Sanitary Services'}
- {scc_code: '18', scc_description: RADIO COMMON CARRIER, sic_code: '48', sic_description: Communications}
- {scc_code: '20', scc_description: BANKS AND CREDIT UNIONS, sic_code: '60', sic_description: Depository
    Institutions}
- {scc_code: '22', scc_description: FEDERAL BANKS, sic_code: '60', sic_description: Depository
    Institutions}
- {scc_code: '23', scc_description: BANK W/LIMITED CERTIFICATE, sic_code: '60', sic_description: Depository
    Institutions}
- {scc_code: '25', scc_description: BANK & INSURANCE AGENT}
- {scc_code: '26', scc_description: MRTG/INSURANCE, sic_code: '63', sic_description: Insurance
    Carriers}
- {scc_code: '30', scc_description: INSURANCE COMPANIES, sic_code: '63', sic_description: Insurance
    Carriers}
- {scc_code: '31', scc_description: INSURANCE CONSULTANT, sic_code: '64', sic_description: 'Insurance
    Agents, Brokers, & Service'}
- {scc_code: '35', scc_description: INSURANCE AGENCIES, sic_code: '64', sic_description: 'Insurance
    Agents, Brokers, & Service'}
- {scc_code: '36', scc_description: MORTGAGE CO, sic_code: '61', sic_description: Nondepository
    Institutions}
- {scc_code: '40', scc_description: OTHER FINANCIAL INSTITUTIONS, sic_code: '62',
  sic_description: Security & Commodity Brokers}
- {scc_code: '41', scc_description: INSURANCE REGULATED ENTITIES, sic_code: '63',
  sic_description: Insurance Carriers}
- {scc_code: '42', scc_description: INS COMPANY & INS AGENCY, sic_code: '64', sic_description: 'Insurance
    Agents, Brokers, & Service'}
- {scc_code: '43', scc_description: INS COMPANY & INS REG ENTITY, sic_code: '64',
  sic_description: 'Insurance Agents, Brokers, & Service'}
- {scc_code: '50', scc_description: OTHER PUBLIC SERVICE}
- {scc_code: '51', scc_description: RAILWAYS, sic_code: '40', sic_description: Railroad
    Transportation}
- {scc_code: '60', scc_description: OTHER CHARITABLE INSTITUTIONS, sic_code: '83',
  sic_description: Social Services}
- {scc_code: '61', scc_description: FOUNDATIONS, sic_code: '67', sic_description: Holding
    & Other Investment Offices}
- {scc_code: '62', scc_description: VOLUNTEER RESCUE SQUADS/FIRE DEPTS, sic_code: '92',
  sic_description: 'Justice, Public Order, & Safety'}
- {scc_code: '63', scc_description: RELIGIOUS CEMETERIES, sic_code: '65', sic_description: Real
    Estate}
- {scc_code: '64', scc_description: ORPHANAGES, sic_code: '83', sic_description: Social
    Services}
- {scc_code: '65', scc_description: CHURCHES AND RELIGIOUS DENOMINATIONS, sic_code: '86',
  sic_description: Membership Organizations}
- {scc_code: '66', scc_description: ANIMAL AND CHILDREN'S WELFARE}
- {scc_code: '67', scc_description: SPECIAL HOSPITALS AND COLLEGES, sic_code: '80',
  sic_description: Health Services}
- {scc_code: '68', scc_description: HISTORICAL/LITERARY SOCIETIES, sic_code: '84',
  sic_description: 'Museums, Botanical, Zoological Gardens'}
- {scc_code: '69', scc_description: ARTS AND SCIENCES, sic_code: '82', sic_description: Educational
    Services}
- {scc_code: '70', scc_description: OTHER PROFESSIONAL COMPANIES, sic_code: '87',
  sic_description: Engineering & Management Services}
- {scc_code: '71', scc_description: DOCTORS, sic_code: '80', sic_description: Health
    Services}
- {scc_code: '72', scc_description: LAWYERS/ATTORNEYS, sic_code: '81', sic_description: Legal
    Services}
- {scc_code: '73', scc_description: ARCHITECTS, sic_code: '87', sic_description: Engineering
    & Management Services}
- {scc_code: '74', scc_description: AUDIOLOGIST, sic_code: '80', sic_description: Health
    Services}
- {scc_code: '75', scc_description: SPEECH PATHOLOGIST, sic_code: '80', sic_description: Health
    Services}
- {scc_code: '76', scc_description: CLINICAL NURSE SPECIALIST, sic_code: '80', sic_description: Health
    Services}
- {scc_code: '77', scc_description: HOUSING COOPERATIVES, sic_code: '87', sic_description: Engineering
    & Management Services}
- {scc_code: '80', scc_description: AGRICULTURAL COOPERATIVES, sic_code: '7', sic_description: Agricultural
    Services}
- {scc_code: '81', scc_description: OTHER COOPERATIVES, sic_code: '89', sic_description: 'Services,
    Not Elsewhere Classified'}
- {scc_code: '95', scc_description: BENEFIT CORPORATIONS, sic_code: '83', sic_description: Social
    Services}

This might be "scc_to_sic.yaml". Not sure how this is best represented in the existing code base.

slott56 commented 9 years ago

And this. Not sure where you want to offer this or how the results should be package.

import yaml
from collections.abc import Callable
class File_Map:
    """
    >>> with open("scc_to_sic.yaml") as source:
    ...     scc_to_sic = File_Map(source, 'scc_code')
    >>> scc_to_sic('10')
    {'scc_code': '10'}
    >>> sorted(scc_to_sic('11').items())
    [('scc_code', '11'), ('scc_description', 'ELECTRIC'), ('sic_code', '49'), ('sic_description', 'Electric, Gas, & Sanitary Services')]
    """
    def __init__(self, yaml_file, key_column):
        self.raw= yaml.load(yaml_file)
        self.mapping= {r[key_column]:r for r in self.raw}
        self.key_column= key_column
    def __call__(self, key, default=None ):
        return self.mapping.get(key, default or {self.key_column:key})
waldoj commented 9 years ago

Good Lord, man—there will be nothing left for me to do! ;) I think I'm just going to transform this data if the --transform option is invoked. I'd considered making it optional, within --transform, but I just can't see any reason why it shouldn't be transformed along with everything else. Thanks so much for this!