open-contracting / standard

Documentation of the Open Contracting Data Standard (OCDS)
http://standard.open-contracting.org/
Other
136 stars 46 forks source link

Parties: Organization classifications (e.g. women-owned businesses, legal entity type) #711

Closed jpmckinney closed 4 years ago

jpmckinney commented 6 years ago

Motivation

Some procurement policies create obligations to track and/or promote the participation of women-owned businesses in contracting processes.

Many other classifications exist; women-owned businesses is just used as a common example.

Discussion

Small or Medium Enterprise (SME) is another classification that is frequently the subject of procurement policies. There is a community extension to distinguish SMEs: https://github.com/open-contracting/ocds_partyDetails_scale_extension

For 'women-owned' and other ownership-related classifications, company registers rarely if ever track this information as structured data. All known cases of structured data are from procurement systems.

The US DATA Act Information Model Schema (DAIMS) tags awards with different facts ("Procurement Award Specific Characteristics"), like whether it's "Minority Owned Business" or "Woman Owned Business" but also whether it's "Airport Authority" or "Clinger-Cohen Act Planning Compliance". See this diagram from this page.

If the system were storing demographic facts about the organization's control in a separate field, we could create an extension with narrowly defined fields. However, given that known systems store various facts in a single field, we should create an extension with broadly defined fields, closer to "tags." The extension could have an open codelist to allow a standard way of expressing e.g. "woman-owned business".

Proposal

Given that the classifications of organizations may be about ownership, incorporation, sector, location, number of employees, etc., and given that these classifications intersect ("Small Agricultural Cooperative"), I propose reusing the Classification object. The object is currently only used for Item classifications. Its scheme field uses an itemClassificationScheme codelist, but otherwise the object is generic. As I have no reason to believe that organizations have a 'primary' classification, I propose naming the field classifications instead of following the pattern of a primary classification and then additionalClassifications. I also recommend deprecating the Party Scale extension as its use case will be duplicated by this extension.

Example:

{
  "name": "Acme Inc.",
  "classifications": [{
    "id": "sme",
    "description": "petite et moyenne entreprise"
  }, {
    "id": "women-owned",
    "description": "entreprise appartenant à des femmes"
  }]
}

In terms of existing schemes:

We could model these classifications as an array of strings, but it seems likely that more robust classification schemes will emerge as structured data, for which an array of Classification objects is better suited.

Reference

For reference, the full list of Procurement Award Specific Characteristics in DAIMS is:

Click to expand * 1862 Land Grant College * 1890 Land Grant College * 1994 Land Grant College * 8a Program Participant * A-76 FAIR Act Action * Airport Authority * Alaskan Native Owned Corporation or Firm * Alaskan Native Servicing Institution * American Indian Owned Business * Asian Pacific American Owned Business * Black American Owned Business * City Local Government * Clinger-Cohen Act Planning Compliance * Commercial Item Acquisition Procedures * Community Developed Corporation Owned Firm * Community Development Corporation * Council of Governments * Country of Product or Service Origin * County Local Government * Commercial Item Test Program * Consolidated Contract * Contingency Humanitarian or Peacekeeping Operation * Contract Bundling * Contract Financing * Contracting Officer's Determination of Business Size * Contracts * Corporate Entity Not Tax Exempt * Corporate Entity Tax Exempt * Cost Accounting Standards Clause * Cost or Pricing Data * Davis Bacon Act * DoD Claimant Program Code * Domestic Shelter * DoT Certified Disadvantaged Business Enterprise * Economically Disadvantaged Women Owned Small Business * Educational Institution * Emerging Small Business * EPA-Designated Products * Evaluated Preference * Extent Completed * Fair Opportunity Limited Sources * Fed Biz Opps * Federal Agency * Federal Action Obligation * Federally Funded Research and Development Corp * For Profit Organization * Foreign Funding * Foreign Government * Foreign Owned and Located * Foundation * Government Furnished Equipment GFE and Government Furnished Property GFP Grants * Hispanic American Owned Business * Hispanic Servicing Institution * Historically Black College or University * Historically Underutilized Business Zone HUBZone Firm * Hospital Flag * Housing Authorities Public/Trial * Indian Tribe Federally Recognized * Information Technology Commercial Item Category * Interagency Contracting Authority * Interagency Contracting Category * Inter-Municipal Local Government * International Organization * Interstate Entity * Joint Venture Economically Disadvantaged Women Owned Small Business * Joint Venture Women Owned Small Business * Labor Surplus Area Firm * Limited Liability Corporation * Local Area Set Aside * Local Government Owned * Major Program * Manufacturer of Goods * Minority Institution * Minority Owned Business * Multi Year Contract * Municipality Local Government * National Interest Action * Native American Owned Business * Native Hawaiian Owned Business * Native Hawaiian Servicing Institution * Nonprofit Organization * Number of Actions * Number of Offers Received * Other Minority Owned Business * Other Not For Profit Organization * Other Statutory Authority * Other than Full and Open Competition * Partnership or Limited Liability Partnership * Performance-Based Service Acquisition * Place of Manufacture * Planning Commission * Port Authority * Price Evaluation Adjustment Preference Percent Difference * Private University or College * Product or Service Code * Program Acronym * DOD Acquisition Program * Purchase Card as Payment Method * Receives Contracts and Grants * Recovered Materials/Sustainability * Research * SAM Exception * SBA Certified 8 a Joint Venture * School District Local Government * School of Forestry * Sea Transportation * Self-Certified Small Disadvantaged Business * Service Contract Act * Service Disabled Veteran Owned Business * Small Agricultural Cooperative * Small Business Competitiveness Demonstration Program * Small Disadvantaged Business * Sole Proprietorship * Solicitation Identifier * Solicitation Procedures * State Controlled Institution of Higher Learning * Subchapter S Corporation * Subcontinent Asian Asian - Indian American Owned Business * Subcontracting Plan * The AbilityOne Program * Township Local Government * Transaction Number * Transit Authority * Tribal College * Tribally Owned Business * Type Set Aside * U.S. Federal Government * U.S. Government Entity * U.S. State Government * U.S. Tribal Government * Veteran Owned Business * Veterinary College * Veterinary Hospital * Walsh Healey Act * Woman Owned Business * Women Owned Small Business * U.S. Local Government * Domestic or Foreign Entity * Undefinitized Action
timgdavies commented 6 years ago

Based on discussions around #369 and the creation of the parties/details container in OCDS 1.1 we have been on the path of introducing specific extensions for particular collections of classifications, rather than a generic classifications object.

I believe that the Party Details extension approach makes it easier for us to surface, document, display and validate the different classification schemes that might be applied to organisations, and makes analysis in spreadsheet software easier when concepts are in distinct fields (rather than split into-subtables), as well as supporting cases where the detail may be more complex than a single classification.

When we consider party details, we need to consider two use cases for organisational classification, and whether these need a local or a community extension.

(1) Monitoring implementation of domestic policy

In this case, it is important to have classifications with definitions that directly match the domestic definitions. Publishers will want to use their local categories, rather than mapping to some generic set.

Policy-related 'flags' are often quite idiosyncratic, and tricky to map onto some global set with any universal applicability. These are well suited to creation of local extensions (e.g. DAIMSClassification)

(2) Cross-cutting analysis

In these cases, generic categories are used for analysis across countries - such as analysis based on size of entity (micro, sme, large). Whilst each country might have their own category definitions and thresholds, they are likely to have a one-to-one or many-to-one mapping to the generic categories, close enough to support analytical re-use of the data.


So, in the case of any categorisation like 'women-owned businesses' I think we need to look at the use case for the data, and identify:

Sometimes, we may find we want to encourage publishers to use both a local extension and a community extension. For example, an organisation tagged 'Women Owned Small Business' in a US dataset might (depending on exact definitions of the concepts here) be able to map to:

{
   "parties":[{
       "id":"123456",
       "details":{
           "daimsClassification":["Women Owned Small Business"],
           "scale":"sme",
           "femaleOwnership":"full"
       }
   }]
}

thus serving both domestic and cross-cutting analytical use-cases.

jpmckinney commented 6 years ago

I anticipate that a classifications field will still be desirable, as we (and the broader community) can't or won't anticipate every possible classification and/or have the desire or capacity to design an extension for each – and I suspect that the additional difficulty when working with naively flattened data (instead of bespoke flattened data which can compensate for the issues described) will be acceptable for such cases.

For the specific case of women-owned businesses, if additional detail like full, majority, partial, etc. ownership has sufficient supply and/or demand, then we have a strong case for adding a new field under details. Otherwise, it depends whether the use cases are sufficiently common to motivate a separate field.

timgdavies commented 6 years ago

The challenges I can see with a generic classifications field are that, without an extension being created to describe each code and support validation we may end up in a situation where:

A publisher still has the option of including an additional field, which the validator will report on, but not fail the data for.

We don't currently validate classifications against their schemes.

We considered some sort of generic flags or classifications array instead of the parties/details model during discussions around 1.1 (and also during early 1.0 development), but have set down the path of asking for extensions. I'm cautious therefore about going back to classifications.

For the specific case of women-owned businesses, if additional detail like full, majority, partial, etc. ownership has sufficient supply and/or demand, then we have a strong case for adding a new field under details. Otherwise, it depends whether the use cases are sufficiently common to motivate a separate field.

Yes - this will be an area where we might need some research / thinking in each case.

jpmckinney commented 6 years ago

Both structures require documentation and validation support…

CoVE doesn't presently have logic for validating classification objects, but that's a problem we can solve. CoVE presently reports "Additional Codelist Values" and "Additional Fields". It's conceivable to report additional classification id values. A simplistic example: We have a codelist CSV file for classification schemes, which includes a column with a URL to a codelist; CoVE then compares the id values for classification objects with that scheme to its codelist. Another column can specify whether the scheme is 'closed' or 'open'; if closed, CoVE errors if an id value matches no code.

Having a specific field for every concept (instead of using generic building blocks) is not a panacea, either (sparse flattened data, collecting related fields, complexity of having more fields and extensions, etc.). Both generic and specific structures can be the better option depending on the circumstances. Once we have more use cases relating to women-owned businesses, we can consider which is best.

Also, I can't find any discussion in GitHub issues about flags/classifications instead of details, but perhaps we have some log somewhere to review.

jpmckinney commented 5 years ago

This is proposed under .details for EU context in https://github.com/open-contracting-extensions/european-union/issues/10

jpmckinney commented 5 years ago

Noting earlier discussions here: https://github.com/open-contracting/standard/issues/181#issuecomment-230097048

jpmckinney commented 5 years ago

Colombia tracks:

cc @yolile

jpmckinney commented 5 years ago

Regarding organization type, the EU uses the less ambiguous term "legal form": https://github.com/eForms/eForms/issues/216

LindseyAM commented 5 years ago

Which of the OCDS publishers specify women owned or small businesses?

Hera is leading some upcoming research about empowering women to participate in public procurement through open contracting. It would be great to understand which of our partners are reporting whether women owned or women led businesses are participating as bidders.

cc @yolile @pindec @herahussain

jpmckinney commented 5 years ago

This is being worked on in CRM issue 4882, and this issue will be updated with an answer when ready.

jpmckinney commented 5 years ago

Dhangadhi has parties/details/femaleChaired. Colombia intends to publish gender data for suppliers. Paraguay intends to publish SME status of suppliers.

LindseyAM commented 5 years ago

Thanks James! Sounds like Colombia and Paraguay would be good places to include in our gender research project and possibly Nepal too if we want to include a more LDC context @herahussain

romifz commented 4 years ago

Dominican Republic will include SME and women-owned business classifications in their OCDS data. Both are expressed as flags in their source systems.

romifz commented 4 years ago

Could we consider a mixed approach? Like having a classifications or flags array od strings for the classifications we expect to be compared across publishers (SME and women-owned so far) and instruct publishers to use details to define extensions for more specific classifications?

jpmckinney commented 4 years ago

@romifz Yes, it's looking like there will be a mixed approach. For this issue, we still need to figure out what the common, comparable approach should be.

jpmckinney commented 4 years ago

Re-organizing some points from the discussion:

We will never be able to standardize every way of classifying organizations. We have a few options (which can be mixed):

  1. Add specific fields (like party scale), where we have sufficient evidence to offer a standardized option
  2. Recommend that publishers use additional fields / local extensions where there is no standardized option
  3. Collect all non-standardized options into one field (like organization classification)

We should continue to do (1). For the concepts mentioned so far:

I think we should recommend (3), using the organizationClassification extension. Publishers would set a scheme. In the EU profile, we prefix TED_ to the XML element. The schemes are:

The guidance to publishers would be similar to how organization identifier schemes are constructed, e.g. with an ISO-3166-1 code as the first part of the scheme. Publishers would be encouraged to list all schemes and classification codes in their publication policy. As concepts become standardized, we can recommend that publishers use the new fields, as in (1) – though they likely ought to continue to publish the existing schemes, for any users that rely on them.

If we were to recommend (2), then users would end up seeing many additional fields and/or local extensions. They would have to reconcile which of these relate to organizational classification, and which relate to other details about the organization. Once concepts are standardized, users would see both the standardized field and the local field at the same level of hierarchy, which will likely cause confusion or at least raise questions.

Publishers can still choose to do (2) if they have good reasons, but I think (3) is better, because it collects all non-standardized organizational classification concepts in one place, and because, once standardized, there is a clear segmentation between the standardized and non-standardized concepts.

romifz commented 4 years ago

Publishers would set a scheme. In the EU profile, we prefix TED_ to the XML element. The schemes are:

  • COFOG (a standardized scheme)

In the case of known classification schemes, I think a codelist should be provided. Otherwise, two publishers may choose different codes for the same scheme.

jpmckinney commented 4 years ago

We haven't done this in the past because:

  1. Some external schemes are licensed such that we would violate copyright by providing a codelist.
  2. Even if we provide a codelist, the Data Review Tool doesn't have a mechanism for optionally applying a codelist to an id field based on the value of a scheme field.

We could provide a codelist in cases where we wouldn't violate intellectual property rights, but the value of providing that codelist is diminished until/unless we update the Data Review Tool (or some other tool) to be able to use that codelist.

Or are you thinking that some publishers would use the codelist to guide their implementation?

romifz commented 4 years ago

Sorry, that's not what I meant, "codelist" may not be the right term.

What I think is a list of "names to use" in the scheme field for known classification schemes should be provided. As COFOG, TED_CA_TYPE, etc. I think this can take the form of an open codelist.

From what I've seen, publishers struggle to understand the goal of the identifier block for organization identifiers, especially the scheme field and how to build a short "code" for the organization registry they are using. Because of this I think guidance should be clear on how to define the value of the scheme field, and that names for known classification schemes should be provided if possible.

jpmckinney commented 4 years ago

Ah, I understand.

The Classification block presently has an open itemClassificationScheme.csv codelist for its scheme field. This, of course, only makes sense when the Classification block is used in the context of items. (The definition of the scheme field has "… For line item classifications, this uses the open itemClassificationScheme codelist.")

The Classification block was intended to be used in other contexts, but it only anticipated needing a codelist in the context of items.

I think the solution here is to do something similar to the documentType codelist. That list is used for Document blocks in the context of tender, awards, etc. However, some document types only make sense in specific contexts (as noted by that codelist's Section column).

So, for 1.1.5, I would propose adding a Context (or similar) column to itemClassificationScheme.csv, and then add schemes for organization classifications.

Then, in 1.2.0, we can rename itemClassificationScheme.csv to classificationScheme.csv. (Technically, this is allowed in 1.1.5, but I prefer to do it in a minor release in case anyone hard-coded the filename of the CSV.) I'll create a new issue for all this.

LindseyAM commented 4 years ago

Last week @pindec and I discussed a simple option like an open code list to allow publishers to identify special characteristics of bidders and contractors according to their own policy needs, with the option of a details field to give more details if needed.

also, just FYI according to this blog: https://www.opengovpartnership.org/stories/faces-of-open-government-jen-bretana/ South Cotabato publishes 'gender tagging of women owned businesses joining biddings or awarded with contracts' (but I don't have more details about how this is published)

jpmckinney commented 4 years ago

Regarding adding the codes mentioned above to the codelists:

Sometime after we authored the EU profile, the EU created these controlled vocabularies:

They don't seem to yet have one for the transport categories at https://op.europa.eu/en/web/eu-vocabularies/authority-tables I can't find any list in the regulation or from a quick search of EUR-Lex, so for TED_CATEGORY we might go with:

The standard forms confusingly refer to these as "areas covered", but the legislation uses "areas" to mean geographic areas, whereas the forms are clearly about services.

We can prefix the titles with "EC" like we do for CPV, to scope to the European Commission, and use an abbreviated/edited description from those pages's About tab, so:

It's most important to include in the codelist the link to the Source, so users can follow through.

For COFOG, the https://unstats.un.org/unsd/classifications/Econ/ link you found is better than anything I'd seen so far (it has both data files with codes and titles, and PDFs with long descriptions of each code). So, I recommend that for the source. The title/description can be (based on this):

jpmckinney commented 4 years ago

Now in Explorer: https://extensions.open-contracting.org/en/extensions/organizationClassification/master/

I need to review this issue to make sure everything is covered either by #990 or the new extension. Ideally, separate issues can be created for any follow-up, so that this wide-ranging issue can be closed.

jpmckinney commented 4 years ago

The most relevant comment is https://github.com/open-contracting/standard/issues/711#issuecomment-574764526, which is now documented by #990.

The comments about changes to itemClassificationScheme.csv are continued in #982.

jpmckinney commented 3 years ago

Related to legal form (but at a less granular level), MAPS defines "Civil society organisation (CSO)" as:

The multitude of associations around which society voluntarily organises itself and which represent a wide range of interests and ties. These can include community-based organisations, indigenous people’s organisations and nongovernment organisations.

jpmckinney commented 6 months ago

Regarding legal form, here are two codelists. However, the codes used are opaque:

https://www.gleif.org/en/about-lei/code-lists/iso-20275-entity-legal-forms-code-list https://op.europa.eu/en/web/eu-vocabularies/concept-scheme/-/resource?uri=http://data.europa.eu/ih3/legal-form/ELF

These two are linked from https://semiceu.github.io/Core-Business-Vocabulary/releases/2.2.0/#LegalEntity.legalformtype