How to identify Concepts (in the context of the distributed web) ?

Content-based addressing (thanks to cryptographic hash) is a very elegant and powerful mechanism for identifying on the web a digital « things » (i.e. : text, photo, music, video, software, …). Rq : In fact, everything that could be represented by 0 & 1 ! => More information about this approach here : https://infocentral.org/drafts/PrinciplesDraft.html

But how to identify a Concept ? We could not use the content-based addressing method, ..
... just coz a Concept in not "made of" 0 & 1 !

So, which method could we use instead ?

At this stage, I can imagine 3 methods a :

A pure random identifier
A ID calculated from its several digital representations
A IEML « word » and/or « sentence »

(All those methods are alternative related to the current method based on URL/URI)

A pure random identifier

About : Simple & efficent Advantage(s) :

Provide an unique ID for each Concept Inconvenient(s) :
How to choose the identifier "space" (character types, lenght, ... ) ?
How to retrieve the concept once identified ?
How to be sure that teh "same" Concept is not identifier twice (i.e. with 2 different IDs)

A ID calculated from its several digital representations

About : This approach mimic the content-based approach ; but seems to take the problem in reverse, but why not explore it anyway ? Advantage(s) :

Automatic calculation of the ID.
Each new digital representation "complete & refine" the Concept itsef Inconvenient(s) :
Some concepts could have no digital representation.
The ID of the Concept will change avery time we will add a new digital representation.

A IEML « word » and/or « sentence »

About : IEML (https://twitter.com/IEML_) is a univocal langague where existe a 1-1 relation between the semantic and the syntaxic structures (phoneme, morphene, word, ...) : https://www.topincs.com/EntangledBootstrap/2006 Advantage(s) : ...

The semantic is "calculable" for the s Inconvenient(s) : ...
Some concepts are very difficult to express in IEML (e.g. How to say COVID-19 ?)

This is a very tricky design concern, one that I will be detailing at length in the upcoming but oft' delayed InfoCentral Design Proposal draft. :)

The more generalized need is the ability to anchor "root" nodes in the graph, whether they represent a concrete real-world thing / data or an abstract concept. My new solution is a combination approach of hash-based IDs (HIDs) and unique value IDs (UVIDs).

Concrete roots may contain various identifying information (initial properties, etc.) and are typically cryptographically signed by a trusted author. Such roots are then referenced using standard HIDs. They are more appropriate for anchoring unique real-world objects, creative works, etc.

Abstract roots are unowned, do not contain information, and only need to be unique. This is, of course, where concepts belong. Abstract roots don't need to exist as real data entities. They are self-existent in their uniqueness relative to an unowned namespace. UVID is my term for the various approaches mentioned above: random nonces, unique strings (hashtags, folksonomy), artificial language words/phrases, etc. In order to complete the design, we only need canonical mappings from UVID schemes to HIDs. This allows them to be integrated into content-based networks, used in place* of HIDs for references in data entities, used for networked reference collection, etc. Interestingly, this also effectively serves as a default indexing scheme. Natural language words and phrases found in ordinary text can be treated as implicit UVIDs and deterministically mapped to graph nodes.

Abstract roots gain meaning by becoming woven into contexts by reference. They are externally described, mapped, and utilized. They can be specialized and generalized by mapping against other roots. Ontologies evolve to make formalized use of them, possibly indirectly through perspectives (which are themselves immutable roots!). And networking drives the most popular meaning(s) and metadata for any given root / concept. Reference collection removes any need/desire for mutable references or owned namespaces, the faulty approach being explored by most other projects.

Sidenotes:

Another benefit of UVIDs is that they don't degrade like hash functions. (future cryptographic attacks) So if a hash function becomes weakened, UVIDs can be simply mapped to a better hash among networks. The original UVIDs references remain intact, whereas HIDs references in existing entities would need to be recreated or externally patched. (I have a whole scheme for this, so don't worry too much!)
I don't see a benefit to multiple digital representations that must first be mapped to the same unique value. This seems like an anti-pattern, in contrast to univocal languages.

iPlumb3r / KeQuarks