open-contracting / ocds-extensions

Collects issues for published extensions in one place
1 stars 0 forks source link

Bids: Make statistics more generic #92

Closed jpmckinney closed 2 years ago

jpmckinney commented 5 years ago

BidStatistic has a generic model, but a specific name.

In Colombia, there is a release-level processStatistics field, that is an array of objects, in which each object is the same as a BidStatistic, but the local extension uses a new class, since BidStatistic implies that it is a statistic about the bid.

We should consider renaming BidStatistic to Statistic and updating field titles/descriptions if needed. In doing so, we should consider the overlap with Observation in the metrics extension, which is like a statistic, but only measured in the context of a metric.

https://github.com/sdd1982/processStatistics/blob/master/release-schema.json

cc @yolile

duncandewhurst commented 2 years ago

How would this work in terms of backward compatibility? Presumably, we don't want to deprecate Bids.statistics, but we would need to change definitions/Bids/properties/statistics/items/$ref to point to the renamed Statistic definition.

jpmckinney commented 2 years ago

Renaming a definition does not cause existing data to become invalid/incorrect, so it is backwards-compatible.

We do not have any compatibility guarantees for extensions (which are the only thing that can break when a definition is renamed).

duncandewhurst commented 2 years ago

We should consider renaming BidStatistic to Statistic and updating field titles/descriptions if needed. In doing so, we should consider the overlap with Observation in the metrics extension, which is like a statistic, but only measured in the context of a metric.

Some fields are common, with no difference in semantics:

Some fields are distinct, but make sense in both contexts and so could be added to a common object

I can't think of a use for Observation.relatedImplementationMilestone in the context of bid statistics, but I don't think that's a blocker to including it in a common Statistics object. Alternatively, if the Statistic object will be in the core schema, it could be added in the metrics extension.

Some similar concepts are modelled in different ways:

What is being measured

In the metrics extension, Observation does not have a property for what is being measured, instead Metric.title and Metric.description provide a free-text title and description for what is being measured, and Metric.observations is an array of measurements.

In BidsStatistic, .measure codifies what is being measured, using values from the bidStatistics codelist.

For a common Statistic object to work in the context of bids.statistics there would need to be a field to represent what is being measured, in which case there would be two ways of representing the same information in the metrics extension.

I don't think measure is a good name for the field in the bids extension, because it's commonly used to mean a quantitative value on which calculations can be made (in keeping with its usage in Observation).

The value of the measurement

In Observation, .value (a Value object) is used for financial values and .measure (a number or a string) is used for non-financial values. For non-financial values, the unit of measurement can be provided in Observation.unit.

In BidsStatistic, .value (a number) is used for all values and .currency is also populated for financial values. .valueGross (a number) can also be populated with for financial values that include taxes.

The approach in the metrics extension seems better aligned with other modelling in OCDS because it uses the Value object to represent values. It would also be compatible with the approach to publishing net and gross amounts proposed in https://github.com/open-contracting/standard/issues/817#issuecomment-542353977. On the other hand, it means that users need to look in different places for financial and non-financial values.

BidsStatistic.value/ProcessStatistic.value are more widely used than Observation.value or Observation.measure:

colombia: processStatistics_measure (8775710 occurences)
colombia_bulk: processStatistics_measure (2438400 occurences)
digiwhist_austria: bids_statistics_measure (40887 occurences)
digiwhist_belgium: bids_statistics_measure (30462 occurences)
digiwhist_bulgaria: bids_statistics_measure (223434 occurences)
digiwhist_croatia: bids_statistics_measure (130848 occurences)
digiwhist_cyprus: bids_statistics_measure (1862 occurences)
digiwhist_czech_republic: bids_statistics_measure (167078 occurences)
digiwhist_denmark: bids_statistics_measure (22008 occurences)
digiwhist_estonia: bids_statistics_measure (121717 occurences)
digiwhist_finland: bids_statistics_measure (30921 occurences)
digiwhist_france: bids_statistics_measure (305573 occurences)
digiwhist_georgia: bids_statistics_measure (3 occurences)
digiwhist_germany: bids_statistics_measure (280407 occurences)
digiwhist_greece: bids_statistics_measure (13012 occurences)
digiwhist_hungary: bids_statistics_measure (369278 occurences)
digiwhist_iceland: bids_statistics_measure (179 occurences)
digiwhist_ireland: bids_statistics_measure (10023 occurences)
digiwhist_italy: bids_statistics_measure (33495 occurences)
digiwhist_latvia: bids_statistics_measure (69129 occurences)
digiwhist_lithuania: bids_statistics_measure (31885 occurences)
digiwhist_luxembourg: bids_statistics_measure (1458 occurences)
digiwhist_malta: bids_statistics_measure (2005 occurences)
digiwhist_netherlands: bids_statistics_measure (41716 occurences)
digiwhist_norway: bids_statistics_measure (23325 occurences)
digiwhist_poland: bids_statistics_measure (2723564 occurences)
digiwhist_portugal: bids_statistics_measure (24296 occurences)
digiwhist_romania: bids_statistics_measure (190638 occurences)
digiwhist_slovakia: bids_statistics_measure (105901 occurences)
digiwhist_slovenia: bids_statistics_measure (190857 occurences)
digiwhist_spain: bids_statistics_measure (87470 occurences)
digiwhist_sweden: bids_statistics_measure (7639 occurences)
digiwhist_switzerland: bids_statistics_measure (52 occurences)
digiwhist_ted: bids_statistics_measure (2535961 occurences)
digiwhist_united_kingdom: bids_statistics_measure (123412 occurences)
honduras_iaip: bids_statistics_measure (25699 occurences)
scotland_public_contracts: bids_statistics_measure (36610 occurences)
uk_fts: bids_statistics_measure (17315 occurences)
uk_fts_test: bids_statistics_measure (719 occurences)
mexico_administracion_publica_federal_api: contracts_implementation_relatedProjects_metrics_observations_measure (14801 occurences)
mexico_nuevo_leon_records: contracts_implementation_metrics_observations_measure (1812 occurences)
colombia_ani_records: contracts_implementation_metrics_observations_value_currency (16 occurences)
colombia_ani_records: contracts_implementation_metrics_observations_value_amount (16 occurences)
honduras_cost: planning_forecasts_observations_value_currency (224 occurences)
honduras_cost: contracts_implementation_metrics_observations_value_amount (18 occurences)
honduras_cost: planning_forecasts_observations_value_amount (224 occurences)
honduras_cost: contracts_implementation_metrics_observations_value_currency (18 occurences)

Change history

In the metrics extension, each Metric can have an array of Observation objects, each with a different .period so that change over time can be preserved in a compiled release.

In the bids extension, the description of BidsStatistic.date suggests that BidsStatistic.value should be overwritten rather than a new statistic added to Bids.statistics.

I don't think this is a blocker to having a common Statistic object.

duncandewhurst commented 2 years ago

@jpmckinney based on the above, do you have any thoughts on whether or not to replace Observation and BidsStatistic with a common definition?

jpmckinney commented 2 years ago

Thanks for the research!

In retrospect, observations and statistics are not the same concept, so it makes sense to have different definitions.

However, they ought to be aligned where possible. The differences identified are (if I read correctly):

For 1.2, we can make breaking changes to these extensions (and potentially merge bids into OCDS https://github.com/open-contracting/standard/issues/1179).

That said, I don't think the alignment is of sufficient value to warrant the breaking change, so I would be in favor of leaving the fields as-is.

That said, we can still rename the definition as in the issue description, and update field titles/descriptions to allow the Statistic definition to be used in non-bid contexts.

duncandewhurst commented 2 years ago

Sounds good to me!

odscjen commented 2 years ago

So BidStatistic is to be renamed Statistic. Does this involve deprecating BidStatistic and creating a new definition for Statistic based on BidStatistic, or is it enough to just alter BidStatistic and explain the changes in the change log?

jpmckinney commented 2 years ago

It's enough to do the latter, as JSON Schema definitions are considered an implementation detail.

odscjen commented 2 years ago

should bidStatistics.csv also be renamed to statistics.csv? All the codes in it are bid specific but as it's an open codelist publishers can add their own non-bid specific ones, so the rename would just be to align the list with its parent object.

jpmckinney commented 2 years ago

Good catch. We usually go with singular, so statistic.csv.