ucum-org / ucum

https://ucum.org
Other
53 stars 10 forks source link

viable cell density #224

Open timbrisc opened 2 years ago

timbrisc commented 2 years ago

Issue migrated from trac ticket # 5826

component: organization | priority: minor | keywords: VCD

2022-06-07 21:09:51: william.hess@fda.hhs.gov created the issue


UCUM Board needs to provide input. UCUM already has mass density, linear mass density, lum. intensity density, magnetic flux density, and spectral density, so adding a UCUM term called "viable cell density" would be somewhat consistent. Viable cell density is the number of living cells per unit volume. Please see http://www.oilgae.com/ref/glos/cell_density.html.

This new concept is needed to support the U.S. Food and Drug Administration Pharmaceutical Quality Chemistry Manufacturing and Control (PQ/CMC) initiative. Please see https://www.regulations.gov/document/FDA-2022-N-0297-0001, which in part, states:

"PQ/CMC is a term used to describe manufacturing and testing data of pharmaceutical products. PQ/CMC encompasses topics such as drug stability, quality specification, batch formula, and batch analysis, which are important aspects of drug development. PQ/CMC plays an integral part in the regulatory review process and life cycle management of pharmaceutical products. The development of a structured format for PQ/CMC data will enable consistency in the content and format of PQ/CMC data submitted, thus providing a harmonized language for submission content, allowing reviewers to query the data, and, in general, contributing to a more efficient and effective regulatory decision-making process by creating a standardized data dictionary.

The impetus for this standardization effort was the provisions from the 2012 Food and Drug Administration Safety and Innovation Act (Pub. L. 112-144), which authorized the Agency to require certain submissions to be submitted in a specified electronic format. PQ/CMC standardization supports FDA's regulatory needs in receiving structured and standardized data in pharmaceutical quality and includes two objectives: (1) To standardize the pharmaceutical quality data that is currently received by FDA in eCTD Module 3 from the sponsoring organizations, and (2) to use these structured elements and develop a FHIR data exchange solution."

gschadow commented 1 year ago

This is a concentration really.

Same as CFU/mL.

Actually, since to be "colony forming" a cell needs to be viable, and the smallest colony forming unit is a single cell, viable cell concentration should be defined quite exactly parallel if not synonymous with CFU.

One can argue that VCD is not CFU because CFU is for microorganism whereas VFD is for eukaryotic stem cell cultures or some such. But such is not said in the request. This suggests the request should clarify how it is and isn't CFU.

My instinct tells me that a CFU/mL is a specialization of a VCD....

But on second thought, I sounds ridiculous reading that back to myself.

A CFU is a specialization of a VC.

But how is VCD even a unit? Any term with "density" is a kind of quantity, with the only exception being "optical density" (OD) and that was recently discussed (and I forgot how OD is actually a unit, hopefully there is a link somewhere to that preserved discussion.)

So, in short, something defined as "the number of something per unit volume" cannot be accepted as a unit, because it has left that unit volume as a free variable.

colin-e-hscic commented 1 year ago

Not sure these considerations of the details matter. It appears that "viable cells" are just a countable entity and therefore as it stands the unit of the numerator in the expression is "1" (unity).

I agree that users want to include every type of countable thing in the units table so they can say RBC/uL etc, in native UCUM, but that is not scaleable, there are an infinite number of item types to count.

On that basis IHMO the small number of "synonyms of 1" that UCUM includes already such as CFU are a mis-step, because there is literally no end to the number of countable things in the universe so the standard can never include them all. Farmers need to count sheep and cows, but "number of sheep" and "number of cows" are not candidates for new standard units, so why should "number of red blood cells" or "number of colony forming units" be different?

I do accept that users really really want this, but I think (again IMHO) that would be better resolved as a design change/enhancement to UCUM. Countable entities would ideally be a new feature in the syntax. Unfortunately we are running out of bracket types in plain ASCII, but for example using double quote pairs once could say-

This would allow for an infinite range of countable things to be used in expressions without exploding the size of the standard unit list. It would also clearly distiguish items that are semantically significant but sit under the category of countable entities, from the items in curly braces {annnotation} which are semantically void comments.

How these things should be treated by the semantics of a UCUM engine (library) is an interesting question.

Annotations are supposed to be non-semantic elements and can be stripped out or ignored in processing.

However if users care enough about the different types of countable items to name them explicitly, then ideally a UCUM library would be able to handle the idea of units that are a subtype of another type.

I.e. some of the time "RBC" and "WBC" are distinct units (not commensurable), they can't be added or subtracted and they don't cancel out as numerators vs denominators. However at other times that are just subtypes of "cell" (which is a subtype of "1") and they can be added/subtracted/cancelled at will. The engine would need to be told which approach to take.

dr-shorthair commented 1 year ago

I agree with @colin-e-hscic overall thrust here. However, I would suggest that if we head down the path of 'countable things' then the list of those should be governed separately to the core UCUM units.

dr-shorthair commented 1 year ago

@colin-e-hscic could you provide some information in your GitHub profile so we can know who you are?

colin-e-hscic commented 1 year ago

@colin-e-hscic could you provide some information in your GitHub profile so we can know who you are?

I'm heading there now ;-)

colin-e-hscic commented 1 year ago

Profile updated.

Not surprisingly, others more knowledgeable than I have got there first in thinking about this. The paper Redressing grievances with the treatment of dimensionless quantities in SI for example seems to come to similar conclusions, in a much more erudite fashion than I could.

I believe some of the maths-orientated software tools such as Mathematica and MathCAD handle this by aiming to preserve these "dimensionless units" during calculations while allowing conversions and comparisons between them, however that's solely gathered from the documentation, I don't have experience of those systems.

BTW I am not suggesting this is a small thing to implement, or an expected "new feature" of UCUM any tme soon, One for "UCUM2" perhaps ;-) I was primarily flagging some concerns over expanding the set of countable items as unit atoms in UCUM as it is currently constructed.

I'll admit I am not a fan of annotations, but (again IMHO) their use for countable items is one of the less problematic applications because at least it's providing a label for something that is intended to be equal to "1" and it does actually resolve to "1". However the preservation behaviour is dependent on the implementation.

ralphm commented 6 days ago

Has the handling of countable things been addressed since?

I am coming from the software engineering field of observability, that revolves around the collection of metrics, logs / events, and traces. One of the recent efforts in this field is the OpenTelemetry project which has settled on using UCUM to express unit for metrics. A large part of metrics collected in any computing environment is about things that can be counted: packets, errors, operations, clients, containers, etc., and then often presented as a rate over time (e.g. packets per second).

The OpenTelemetry project opted to use annotations to differentiate between different types of things to be counted, while having a way to clearly express the thing being counted:

  • All non-units that use curly braces to annotate a quantity need to match the grammatical number of the quantity it represent. For example if measuring the number of individual requests to a process the unit would be {request}, not {requests}.
  • Instruments that measure an integer count of something SHOULD only use annotations with curly braces to give additional meaning without the leading default unit (1). For example, use {packet}, {error}, {fault}, etc.

I work on an observability tool named [Netdata][https://github.com/netdata/netdata], which is following the lead of the OpenTelemetry project to express units using UCUM. However, besides just using UCUM symbols for denoting units, we also try to use the unit (as well as other metadata) to drive how to visualize and alert on metrics. For example, we would no longer provide an aggregation function for summation for metrics that have the unit % or Cel. Similarly, we want to be able to state "25 packets per second" when we encounter a metric with a current value of 25 and a unit like {packet}/s, as expressed in OpenTelemetry.

Finally there are also related concepts that would typically be dimensionless. In particular the OpenTelemetry project states:

Instruments for utilization metrics (that measure the fraction out of a total) are dimensionless and SHOULD use the default unit 1 (the unity).

An example of this is the measurement of how much time a CPU used in a particular way (in "user" land or in "system" land, where the latter is inside the Operating System kernel, waiting for I/O, or handling a hardware interrupt). This is generally expressed as a fraction of a CPU core. One would be tempted to use percentages here, but since many systems have multiple CPU cores, and processes can actually run on multiple cores simultaneously, you can expect the "user" CPU time of a given process to exceed the number of seconds in the interval measured. I.e. it is valid to state that a given process used the equivalent of 25 CPU core seconds over a period 10 seconds, when it was using two full cores and half of a third on average in that time. So unlike percentages, these measures can be summed. Projects like Kubernetes even speak about milli-CPUs to express CPU resource limits.

I am ambivalent between expressing the above as s{CPU}/s or maybe suggest a new unit [CPU] so that we get to [CPU].s/s instead of 1.