mlabs-haskell / cardano-open-oracle-protocol

COOP - Cardano open oracle protocol
Apache License 2.0
22 stars 3 forks source link

On time, state, events and entities #33

Closed bladyjoker closed 1 year ago

bladyjoker commented 2 years ago

The main question here is Time and specifically whether COOP has special treatment of time as it relates to Fact Statements (abbr. FS).

Problem statement

How COOP considers Time will have a fundamental effect on how COOP actors interact within the protocol. When time is considered an essential part of any FS, COOP must define whether FSs refer to state observations or event collections as it directly informs the programming contracts throughout the protocol components (Submitter, Consumer, Publisher and Collector).

FS as state observations

Let's call these State FS...

Examples (single capital letters are variables):

Essential semantics of State Time is that it's effectively continuous. This means that each State FS can be queried for at any point in time. Pertaining to our examples above, time denoted by at T can be set to any point on a continuous time line.

Q: What does this mean for COOP?

The Publisher API has a central publishFactStatement(at, fsType) that Submitters can use to publish a State FS at a desired point in time. The Publisher backend (which might include an oracle pool) would then serve to provide this information, leveraging suitable inference methods (like numeric interpolation or event simulation) for computing the state from available underlying information (like collected weather snapshots or sporting events).

This formulation is all about depicting the 'world' as a variable over a continuous time, and it supports use cases where one needs to pin the world at a specific time and inspect it.

FS as event collections

Let's call these Event FS...

Examples (single capital letters are variables) related to State FS examples above:

Essential semantics of events is that of a collection that's ordered by time. And as with any collection, one can traverse* it, which makes it fundamentally different from state (continuous types can't be enumerated by definition). Notice that all but the last State FS example can be computed given Event FSs examples here.

(- traversed* is not the term Haskeller would use, they would prefer foldable)

Q: What does this mean for COOP?

This brings additional overhead to the protocol. Submitters now depend on a new COOP feature that allows browsing and searching through event collections that can be published via COOP Publishers. Both human and automated Submitters need to be able to find and pick the exact event they want published after which they call publishFactStatement(eventId, fsType) on the Publisher API.

This means that COOP needs to encompass such Event Search feature, for both humans and automation, to search through events publishable by a given Publisher. Event Identification would also have to be clearly specified as all COOP components would communicate using such event IDs (see #25).

FS as time-agnostic entities

In this scenario, COOP doesn't consider time specially, so the above dichotomy need not be considered (such considerations are likely delegated elsewhere).

The whole system works with Entities, and an entity is communicated via its ID/Address (UUID or content addressed for example) that was allocated during entity creation.

This is likely the scenario COOP should pursue as it makes the least assumptions about the nature of FSs. However, as with the Event FSs, additional architecture pieces need to be added and aligned. Namely, Entity Search feature needs to exist that enables searching through entities that are publishable by a Publisher. After a Submitter had found an entity they'd like published, they approach the Publisher and call publishFactStatement(entityId)

Building a search/browsing feature

Needles to say, it can be done, however I'm not confident it should be build or even if it can be meaningfully built for COOP (given our resource constraints), several reasons for that...

  1. Lack of a schema system that can be used to communicate entity structure throughout COOP,
  2. indexing and search technology stack must be adopted such that it can understand and work with COOP FSs,
  3. entity management and specifically creation and allocation of entity IDs.

It's unclear to me what and how can COOP be decoupled from these considerations, and how to proceed with minimal assumptions about the future.

peterVG commented 2 years ago
bladyjoker commented 2 years ago
  • "After which they call publishFaS(eventId, sofType) on the Publisher API" --> should be publishFaS(eventId, faSType)?

  • "The Publisher API has a central publishFaS(at, sofType)" --> should be publishFaS(at, faSType)?

Done!

GeorgeFlerovsky commented 2 years ago

Alice (a Submitter/Consumer) approaches Olivia (a Publisher) on 5 August 2021 with the following query: Who is the President of France?

Olivia's data systems indicate that her most relevant record for this query was that on 14 May 2017, she observed Emannuel Macron becoming the President of France.

How should Olivia respond to Alice? She has two main options:

  1. "Emmanuel Macron is the President of France on 5 August 2021. I have inferred this based on my closest relevant event record to your query—he became the president on 14 May 2017."
  2. "Emmanuel Macro became the President of France on 14 May 2021. This is my closest relevant event record to your query."

If I understand correctly, option 1 corresponds to "FS as state observations" and option 2 corresponds to "FS as event collections".

Option 1 (state observations) is simpler for Alice to deal with, as Olivia does the inference/interpolation herself and attests to a fact statement corresponding to the requested time in Alice's query.

Option 2 (event collections) is more difficult for Alice to deal with, because she has to figure out on her own how to infer/interpolate the answer she wants from the answer the Olivia gave her.

@bladyjoker argues that Option 2 introduces more complexity to the protocol, as Olivia may need to provide an API/interface for Alice to browse the collection of events that Olivia has. By contrast, for Option 1, Olivia would only need to provide a simple interface for Alice to just make queries for fact statements about any time, without browsing the underlying event collection that Olivia uses to infer/interpolate. Furthermore, he argues that the inference/interpolation mechanism would be simpler to implement than the event browsing capability.

I think we should also consider re-usability of published fact statements. Bob may want to refer to the fact statement from Olivia published by Alice, but he needs it for 4 August 2021. Would Option 1 or Option 2 be more convenient for him to re-use Olivia's published fact statement?

bladyjoker commented 2 years ago

Alice (a Submitter/Consumer) approaches Olivia (a Publisher) at time T with the following query: Who was the President of France on 5 August 2021?

The time T here is 5 August 2021 (that's the time at which we want to observe the fact at).

Olivia's data systems indicate that her most relevant record for this query was that on 14 May 2017, she observed Emannuel Macron becoming the President of France.

How should Olivia respond to Alice? She has two main options:

1. "Emmanuel Macron was the President of France on 5 August 2021. I have inferred this based on my closest observation to your query, where he became the president on 14 May 2017."

Yes, a bit more precisely the Fact Statement is "Emmanuel Macron IS the President of France on 5 August 2021" and a Fact Statement Annotation (if needed or relevant) is "Inferred from the French Inauguration Event that HAPPENED at 14 May 2017"

2. "Emmanuel Macro became the President of France on 14 May 2021. This is my closest observation to your query."

Yes, but I don't see how that would be a realistic use case. It's more likely that the user would find an Event using the Event Search feature with a query like "SELECT eventId, new_president FROM french_inauguration_events WHERE time < DATE 5 August 2021" and then proceed to publish that event.

If I understand correctly, option 1 corresponds to "FS as state observations" and option 2 corresponds to "FS as event collections".

Afaict you got it.

GeorgeFlerovsky commented 2 years ago

Also, perhaps similar arguments could apply to geographic interpolation...

GeorgeFlerovsky commented 2 years ago

Resolution:

peterVG commented 2 years ago

Fact Statements will be a prov:entity with a unique identifier (most likely https://github.com/ulid/spec)

https://www.w3.org/TR/prov-o/#prov-o-at-a-glance

bladyjoker commented 2 years ago

Thanks y'all! Let's keep this open until I update the documentation to reflect the design changes and considerations.

bladyjoker commented 2 years ago

Some notes on Activities for posterity

From @peterVG

RE: time conundrum. I think this discussion needs some sample fact statements 
from me which I realize I've been late in delivering.
I would like to extend W3C Prov-0 as a core semantic 
ontology for these schemas. It includes concepts for 
documenting time and timespans. 
See https://www.w3.org/TR/prov-o/#prov-o-at-a-glance

Activities start and end at particular points in time (described using properties prov:startedAtTime and prov:endedAtTime, respectively).

That's cool, Activities are what was called Processes/Sessions in my previous work and are closely tied to 'intervals' semantically. They occupy a place between Events and State, and treating them specially comes with some benefits.

In simple Haskell parlance, in its simplest form an Activity is a time interval associated with some value (of type a)

type Activity a = (Interval Time, a)

When Activity is not enough

However, and this is not a solely philosophical statement, not everything is an Activity, any (effectively) continuous variable over time you're trying to observe (like temp, height, weight, distance and time itself) is not an Activity (doesn't have discrete events denoting start/end). For instance...

All these are statements about 'state' that are 'observed' over continuous time that CAN'T be formulated as Activities. One could argue though taking periodic observations of such quantities and calling that an Activity between T1 and T2 such that the value of the Activity is whatever was observed at the beginning at time T1. But here we silently imply that we're using the simplest of interpolations between 2 points T1 and T2, by taking the value at T1 as the value of the Activity. Which is totally fair!

But the use of any other non-constant interpolation method makes the point clear that these quantities over continuous time MUST be computed. This is straight from this seminal work on functional reactive programming where what I refer to as State is called Behavior and has the following type:

type Behavior a = Time -> a
type State = Behavior

In this paper the author demonstrates that Behavior/State composes on time, and this opened up a whole new space for simulation programming and modeling interaction. I like to say Conal's FRP did for time and simulation what vector graphics did for space and images.

Acitivites give rise to State

On the other hand, each Activity is trivially a State as well by employing the rule Statement of State holds during the Activity and doesn't otherwise.

Some examples of Activities

fromActivity :: Activity a -> State (Maybe a)
fromActivity (interval, value) = \t -> if t `in` interval then Just value else Nothing

How does this change implementation

All these different time entities assume different programming contracts and interfaces.

Events

If you're operating with Events, one usually gets the ability to merge different Events whilst maintaining the Time ordering between them

merge :: Event a -> Event b -> Event (Either a b)

Since Event is generally formulated as a collection of occurrences

type Event a = [(Time, a)]

You can simply query events as you would any collection. Systems like Splunk and Elasticsearch are well know Event Search systems.

What's commonly done of course is to make Activities and State from Events.

Activites

At my previous work we've built specialized systems that deal with Activities as treating them as first class citizens brings about a lot of indexing optimizations for a class of queries that are natural for intervals. Take a look at Postgresql Range types https://www.postgresql.org/docs/current/rangetypes.html and Haskell's https://hackage.haskell.org/package/IntervalMap

State

Systems dealing with State (over time) are generally computationally intensive as they consume and parse underlying state observations, event occurrences and activities to yield the State value for the requested time T.

Needles to say that State CAN'T by its nature be stored in its entirety in a large-scale DB and then efficiently queried, because there's (effectively) infinite State values (aka observations) between any two time points.

For State system the user expect an API to specify 'which state' and 'when'

at :: State a -> Time -> a