Closed jpmckinney closed 2 months ago
We can also check that the classification scheme codelist covers units of presentation.
From The NASA QUDT Units Ontology: The QUDT Units Ontology provides a vocabulary for describing units of measure.
UN/CEFACT Recommendation 20 list is for 'Codes for Units of Measure Used in International Trade' and https://tfig.unece.org/contents/recommendation-20.htm says: "this Recommendation establishes a single list of code elements to represent units of measure for length, mass (weight), volume and other quantities (including units of count)." Moreover, this list is mainly based on the International System of Units
So both schemes cover only units of measure, not presentation.
As such, we should clarify the descriptions to explicitly allow for both types of units.
Should we instead discourage using this field as the unit of presentation and clarify that it is intended for the unit of measure only?
I think we will have to check how the field is being used – if many publishers are using it for unit of presentation, then it is harder to change the semantics now.
I agree that those schemes are about units of measure, but a publisher could use the field with a scheme for units of presentation (or no scheme).
unit_name_use.txt
I checked some of the publishers who use the Unit.name
field (at least 36, see the list attached), and then checked the unique names used with this colab (and manually) and at least the following use some of the units as a unit of presentation:
So the answer is yes, at least 25 publishers sometimes use the Unit.name
as a unit of presentation. But, we know this is a bad practice that difficult the unit prices comparison, so I think that instead of explicitly allowing both, we should allow (for compatibility) but discourage the use of this field as the unit of presentation.
How many publishers use it as a unit of measure?
The ones who disclose the field and use it only as a unit of measure:
Note that the others also use the field as a unit of measure most of the time and as a unit of presentation fewer times.
Some of them uses other things that are not unit of measure or presentation, for example honduras_cost, mexico_mexico_city_infocdmx, slovenia, mexico_mexico_state_infoem, mexico_oaxaca_iaip (I'm not sure if we should consider "Service" and "Piece" as unit of presentation or something else, I guess the correct unit of measure should be "unit" for these cases)
Thanks. So if we were to narrow the semantics (via "discourage" to avoid backwards-incompatibility), then what is the alternative for disclosing the unit of presentation?
I was thinking about incorporating the container
object from the medicine extension but from https://github.com/open-contracting/standard/issues/1110#issuecomment-730064497 I see that we added the weight
field to be used along with the unit name, where the unit name can be used as the unit of presentation 🤔
What's the problem? Seems okay to me.
Well, the problem is that we will be using the unit as the unit of presentation and I was trying to avoid that :) and we will introduce some conditional semantics, I guess, e.g. if weight
is present then the unit is a unit of presentation, otherwise it probably is a unit of measure
I'm not following. Can you share a full JSON example with each of the unit of measure and unit of presentation?
If you're worried about https://github.com/open-contracting/standard/pull/1125 – I never actually approved the changes, so we can just revert it. (All I did was get tests to pass.)
E.g. to represent "50 liters of Juice in bottles of 10 liters each", with weight
that will be:
{
"items": [
{
"id": "1",
"description": "Juice",
"quantity": 50,
"unit": {
"name": "Bottle",
"weight": {
"value": 10,
"unit": "LTR"
}
}
}
]
}
Here, unit.name
is a unit of presentation (Bottle), but if we want to represent "50 liters of Juice" no matter the package, then the JSON would be:
{
"items": [
{
"id": "1",
"description": "Juice",
"quantity": 50,
"unit": {
"name": "Liters",
"id": "LTR"
}
}
]
}
Where the unit.name
and unit.id
are units of measures. So, if we follow that approach, when unit.weight
is present, the unit corresponds to a unit of presentation, and if not, the unit is a unit of measure.
But if we use container
instead:
{
"items": [
{
"id": "1",
"description": "Juice",
"quantity": 50,
"unit": {
"name": "Liters",
"id": "LTR"
},
"container": {
"name": "Bottle",
"capacity": {
"unit": {
"scheme": "UNCEFACT",
"id": "LTR"
},
"value": "10"
}
}
}
]
}
Both, the unit of measure and the unit of presentation are preserved, but in two separated objects, and items.unit
is always a unit of measure.
Hmm, well "litre" is not a unit of weight (a litre of mercury weighs more than of water). But anyway, we can change the example to be 10 packs of gum, where each pack weighs 50g.
I suggest that we revert #1125 and consider merging container
from the medicine extension. And then we update the guidance to encourage using unit
for unit of measure, and to use container
for unit of presentation.
Users will still have to check the unit
's id
and/or name
to know whether it is a unit of measure or presentation, since older data (as we know) uses it for both meanings.
However, something we still need to figure out is the last example in the medicine extension, which follows this guidance: "If a medicine item is packaged in a multi-drug container, use items.quantity for the quantity in the container and items.unit for the unit." It doesn't use unit
as a unit of measure.
I opened PRs for the medicine extension and for reverting weight
.
There are some items for which a unit of measure doesn't make sense. For example, pencils or the medicine described in the second example of the medicines extension. I think the Item.unit
field should be omitted for these cases, as there is actually no info about the unit of measure. In that case, quantity
will refer to the item itself and not to the unit of measurement. One example with bananas:
I want {quantity}
{unit.name}
{item.name}
packaged in {container.name}
of {container.capacity.value}
{container.capacity.unit.id}
each:
quantity: 100 unit.name: KGM item.name: banana container.name: box container.capacity.value: 10 container.capacity.unit.id: KGM
I want 100 KGM of bananas packaged in boxes of 10 KGM each.
I want 100 KGM of bananas packaged in boxes.
I want 100 KGM of bananas.
I want 100 bananas.
As you can see, quantity
always refers to units, unless the unit.name
is not present. Even if the unit.name is not present, the unit.value
could be disclosed and always refer to one unit of item.name
. We could consider if it will be useful to include the price per container as well.
Sounds good! This can also be used for: I want 100 bananas in boxes of 10 (bananas) each.
We can consider a container.value
field, but we need some evidence of the demand.
@duncandewhurst, could you share your opinion on the modelling/semantics as well?
This seems like a good solution that should make the data more usable for unit price comparisons.
'container' doesn't exactly fit for some cases, e.g. a reel of cable, but I see that is addressed by the description in the medicine extensions schema, although that will need updating so that it is not medicine-specific.
container.name
uses a medicine-specific codelist, but for the general case UNCEFACT Recommendation 21 (Codes for Passengers, Types of Cargo, Packages and Packaging Materials (with Complementary Codes for Package Names)) seems a better fit.
'container' doesn't exactly fit for some cases, e.g. a reel of cable, but I see that is addressed by the description in the medicine extensions schema, although that will need updating so that it is not medicine-specific.
Since the medicine extension is new and not yet implemented, we can rename the field. Would package
or packaging
work?
If we want to align with UNCEFACT Rec 21, the correct term would be packageType
, since package
refers to the container and its contained goods but we only want to describe the container.
Should we also consider an id
and scheme
for the package type?
After further reading, I'm not sure that unit of presentation and unit of packaging are actually the same thing, e.g. I want 1,000 paracetamol tablets presented in blister packs of 20 tablets packaged in boxes of 50 blister packs.
After further reading, I'm not sure that unit of presentation and unit of packaging are actually the same thing
For medicines, the term "container" is used to identify the immediate container for the medicine, that is, the package that is in direct contact with the medicine.
I want 1,000 paracetamol tablets presented in blister packs of 20 tablets packaged in boxes of 50 blister packs.
I guess we will need to investigate if this could be a real example, for example, if the last part is actually normally relevant (the "boxes of 50 blister packs" part). I think that need didn't come up during the medicine extension research, but we could investigate further
Ah I see, the use cases for UNCEFACT Rec 21 are more about the outer packaging, which is of interest to freight companies and customs agencies etc.
Directive 92/27/EEC has:
ISO 11239 (2012) has:
We definitely have evidence of demand for immediate packaging/container, since it is required for unit price comparison, for which there are many known researchers. I think the question here is:
container
or packaging
, then it might be confusing if we later add other types of container/packaging. So we can maybe consider immediatePackaging
or immediateContainer
.I think we have no evidence of use cases for intermediate packaging. While we can easily imagine examples (blister packs in boxes on pallets), I don't know any procurement agency actually detailing such packaging in a structured format, and I'm fairly doubtful of who needs it in a structured format (bidders can just read such descriptions as text).
As for the outer packaging, while we can more easily imagine use cases (e.g. freight companies), I'm not sure that we have evidence of demand. From the supply side, while I think some countries have this data, I'm not sure whether the (imagined) use cases are really strong enough (hard to say which priority user needs are being met). That said:
So we can maybe consider
immediatePackaging
orimmediateContainer
.
That seems like a useful clarification.
I figure there might be cases where the publisher has data on packaging, but they don't know whether the packaging is immediate or outer. Should we provide a place for that information?
Perhaps we should wait until we have a real example before adding something to the schema. In the meantime, we can have a worked example page on containers and packaging, which can include some of the content from this discussion and can refer implementers to the helpdesk if they aren't sure what sort of packaging their data describes.
This issue covers a few things, but I think we can do one or more of:
unit
is for the unit of measurement, and discourage its use for the unit of presentation. https://github.com/open-contracting/standard/issues/1343#issuecomment-1028414476immediatePackaging
for the unit of presentation, as that did come up a lot in the medicine extension research.If we do (2), then I would prefer to have an option for unit of presentation (3).
At any rate, we can start by updating the medicine extension to stage the changes to the schema.
Sounds good!
@jpmckinney checking if this is ready for PR in the medicine extension based on https://github.com/open-contracting/standard/issues/1343#issuecomment-1067316229?
Yes, ready!
To clarify what changes are to be made to the Medicine extension:
Container
to be renamed ImmediateContainer
SimpleUnit
(as referenced within Container
) to clarify that it should only be used for units of measurement and not units of presentation.@jpmckinney is this all? The examples already only cover units of measurements
Container
to be renamedImmediateContainer
Yes, and the field and codelist as well. The only other candidate was immediatePackaging
, but ISO 11239 (2012), at least, seems to prefer "container" for immediate ones and "packaging" for outer ones. Per https://github.com/open-contracting/standard/issues/1343#issuecomment-1065419798 and following comment, we don't presently have use cases for other types of containers/packaging.
- Update the description of
SimpleUnit
(as referenced withinContainer
) to clarify that it should only be used for units of measurement and not units of presentation.
I think the description in the medicine extension is already clear. That bullet was more about OCDS.
Moved this back into To do: Semantics, as the medicine extension has been updated as discussed but there remains the issue of updating guidance and description of unit
Thinking about what needs updated in core 1.2, from the above discussion it seems like the following is to be done to core OCDS:
But now there's nowhere for publishers to put units of presentation so
Add ImmediateContainer
from the medicine extension to Item
, will also require copying over Quantity
and SimpleUnit
- this is to provide a field to declare units of presentation. Quantity
is SimpleUnit
(scheme
and id
) plus value
where value
isn't monetary. This is what makes it different from Unit
which explicitly says that .value
is monetary.
Add a sentence of guidance to Unit discouraging the use of Unit
for units of presentation, and add a sub-section to Item explaining it's use for units of presentation
Yes that's correct! More or less the same list as in https://github.com/open-contracting/standard/issues/1343#issuecomment-1067316229 and following comments.
Great thanks, just wanted to clarify before I started a PR as this is a long and slightly twisty thread :)
Looking at the immediateContainer.csv
codelist which will be getting added to the core standard as part of this it's very medicine specific and missing some obvious container types, e.g. "pallet", "drum", "reel". Is it worth doing some research to fill this codelist out to be more general and less medicine-centric?
Good catch. I would not reuse the medicine codelist (better to not have any codelist than to use that one).
For context, for medicine, we noted in readme:
The
immediateContainer
codelist is a copy of the codes and titles from FHIR's Medication Knowledge Package Type codelist. Given that the terms are undefined in FHIR, the descriptions are copied from corresponding terms from the EDQM Standard Terms database, reproduced with the permission of the European Directorate for the Quality of Medicines & HealthCare, Council of Europe (EDQM). The EDQM Standard Terms database is not a static list and content can change over time; the descriptions were retrieved on July 21, 2021.
If you can find existing standards for containers more generally, we can point to them, but without adding them as a codelist (unless they are really good quality). We can also point to the medicine codelist (if the items are medicines).
Having had a quick look around the only suitable codelist I could find is from GS1 GDSN which has a PackageTypeCodelist (see https://www.gs1.org/standards/gdsn/3-1-27 Code List Document for the current version). This is the organization that creates barcodes for trade and seem good quality and stable so we could reference that, or just not reference anything (other than the medical one for medicines)
Yeah, looks decent (extracted below). What is the copyright (or terms and conditions) on the list? I'm thinking whether to include it in OCDS as an external codelist. At any rate, we can link to it.
The only thing I could find around the T&C or copyright of the standard (and therefore it's codelists) is the GS1 Intellectual Property Policy, from the GS1 IP made simple pdf it seems that to make use of the codelist within OCDS OCP would need to sign up to the IPP, but it's unclear how that would then interact with the open aspect of OCDS?
There's also their website (i.e. where the standard can be accessed) terms of use which states:
The content of this web site is copyright protected. You may download, display, print and reproduce this material in unaltered form only (retaining this notice) for your personal, non-commercial use or use within your organisation. Apart from any use as permitted under Belgian Copyright law, all other rights are reserved. Requests and inquiries concerning reproduction and rights should be addressed to: GS1, 326 Avenue Louise, 1050 Brussels, Belgium, E-mail: contactus@gs1.org.
This notice is not to be erased. You are not permitted to re-transmit, distribute or commercialise the information or material without seeking prior written approval from GS1.
I think the sensible options are either to contact GS1 directly to ask or just to reference the list without including it as an external codelist.
Sounds good - let's just reference it for now.
Current:
From the research for the medicine extension, ISO-11240 (Health informatics — Identification of medicinal products — Data elements and structures for the unique identification and exchange of units of measurement) notes:
In the current description, we only give units of measurement as examples. We also use "unit(s) of measure" in the description of the unit's
scheme
anduri
.However, in practice, governments specify goods using both units of measurement and units of presentation, e.g. "5 pallets of toilet paper", "1 ton of gravel", etc.
As such, we should clarify the descriptions to explicitly allow for both types of units. We can also check that the classification scheme codelist covers units of presentation.