wmo-im / GTStoWIS2

Conversion of GTS headers to WIS2 topic
GNU General Public License v3.0
8 stars 5 forks source link

What are the criteria for embedded content vs. links #36

Closed petersilva closed 2 years ago

petersilva commented 3 years ago

The original encodings which motivated this work always used links, and never embedded content in the messages, as that has been used for many years in Canada. This was considered an option for future implementation, but had so far been deemed unnecessary. Discussion in the ET-CTS in 2019 led to the "Content" field being added to the message format to include the actual message data in the MQP message literally. This was originally intended to help with long latency, low bandwidth satellite links, a fairly special case where it does make alot of sense.

Some feel that embedding the content within the message is a universal accellerator. Generally speaking, we strive to have transfer protocols that encourage as much parallelization as possible as that maximizes the opportunities for speedups. It needs to be appreciated that a queue is a serialization of a kind, and that it slows down posting of files, because each one needs to be read from disk to be included in the message, and the content will then be in the message flow. With pub/sub, a subscriber performs client-side filtering to exclude messages they are not interested in. The larger messages will make the receipt of each message slower, and increase queueing, when compared to a message stream with no embedding. It is also well understood that:

If we want to avoid segmentation, then we need to establish a maximum message size for data to be embedded. That size has to be supported by protocols efficiently.

So we have one case: If every message in an publication stream is of interest to the subscriber, and the messages are all "small" then a completely embedded stream will perform better than third party downloads, assuming the entire stream bandwidth can be supported. Another option to reduce client-side filtering is to have separate channels for warnings. These are guaranteed not to be busy, and we could simply elect to embed everything sent on such channels.

On the other hand, it is well understood if an advertised stream includes numerical weather prediction outputs, satellite and RADAR imagery, as well as observations, that pushing large data through MQP channels will be awful. Such channels will universally experience queueing, and the warnings may be stuck behind them.

Using standard protocols such as https and sftp allows us to achieve much higher total transfer bandwidth rates, without taxing the MQP brokers doing something they are not intended for (large scale data transfers.)

Another benefit of out of band, non-mqp transfers is that they use protocols which are in very wide use with many opportunities for use of Content distribution networks, web accelleration appliances, etc... for which the analogue would be to have third parties implementing brokers, a far more cumbersome prospect.

summary. Embedding:

petersilva commented 3 years ago

thought experiment picking ugly numbers for illustrative purposes:

If we have a maximum message size, then experience with WMO indicates the average message size will converge to about half of the maximum... (when it was 14K, we observed a message size of 7K on our links... but there was segmentation in the story... who knows how it will change in the new methods.) If we pick 8M as a new max (8MB is the max I am hearing in Aviation circles), then imagine 4M becomes the average. A message without payload is about If we are transferring 100 messages per second. without payload, assume a message is 512 bytes. so at any given message transfer rate, the size difference is 8190:1 ... so if we can support 100 messages/second coming in over the message protocol, then we should get 0.02 messages/second with embedded messages. Or you need the message protocol to run 8000 times faster, or some combination.

It becomes a question of how good are brokers at doing actual data transfer (as opposed to switching.) The traditional answer is: not very good. But I fully get that people intuitively think it should be faster, and in many cases it should be, but in a lot of other common cases, I expect it to turn out to be counter-productive.

golfvert commented 3 years ago

With my understanding, brokers are intended to work best with "small" messages. Everyone will agree that a 1GB file is not small. On the other end of the spectrum, 1kB is small. So, a range of 1 to 1000000 between the two. For brokers with the anticipated workload, can we consider 100kB, 1MB being "small"? I agree we need to run some experiment.

petersilva commented 3 years ago

So far, in the committee, we had proposed 4KB as a definition of "small enough to be embedded" and the strategy was just to embed everything smaller than that. I wanted 512 bytes... but in the interest of consensus, we have been using 4KB.

I expect that CAP messages are typically in the 100KB to 500KB range, so about 50x-100x larger... I think the content agnostic approach (embedding all messages smaller than an embedding threshold) is no longer reasonable at that size, as it will slow things down unacceptably for a large number of cases.

We could get more sophisticated, and embed bigger messages as long as they are sufficiently rare and important, but that means we need to understand what we are sending, as opposed to being content agnostic as we have so far succeeded in being.

eliot-christian commented 3 years ago

In my experience, CAP messages are typically quite small.. My guess is that they average about 1,000 bytes (1 K).

To my mind, sudden onset emergencies provide the primary Use Case for embedding CAP alerts in a message queue. In this case, seconds can be the difference between a life-saving alert and an alert that arrives too late. Examples of sudden-onset emergencies include earthquake-early-warning (alerts trying to outrace the earthquake wave propagation) and its analogues in tsunami, flash floods, landslides, volcanic eruption, space weather, et al. Also qualifying as sudden-onset emergencies are public safety matters such as 'active shooter' situations.

We should bear in mind that CAP alerts are often handled without human mediation, as in the triggering of sirens, traffic signals, bridge and tunnel gates, etc,

petersilva commented 3 years ago

You can see Canadian CAP here for the last few weeks: https://dd.weather.gc.ca/alerts/cap/

Looking at today, for one of our seven storm prediction centres, we have issued about a dozen warnings, most of them are in the 200KB to 400KB range. about 1/3 are in the 18KB range. None are smaller than that. Those are from ECCC (Environment and Climate Change Canada ... current name for the met service's parent organization.) Having a look at all Canadian ones, I had a look here:

https://alertsarchive.pelmorex.com/en.php

ECCC ones are the vast majority of CAP produced, but I found some others, and none of the alerts from other organizations were smaller than 5KB.

I just looked a bit more at a CAP, and from ECCC, the digital signature alone is 3KB.

petersilva commented 3 years ago

On the other hand, if people are ok with messages < 4KB being embedded, that is the committee working hypothesis anyways, and if folks agree that that convention includes CAP (ie. we do not expect CAP message bigger than 4KB to be embedded.) then there is already complete agreement.

eliot-christian commented 3 years ago

You are correct that these are huge CAP alerts.

On Thu, May 27, 2021 at 1:45 PM Peter Silva @.***> wrote:

You can see Canadian CAP here for the last few weeks: https://dd.weather.gc.ca/alerts/cap/

Looking at today, for one of our seven storm prediction centres, we have issued about a dozen warnings, most of them are in the 200KB to 400KB range. about 1/3 are in the 18KB range. None are smaller than that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wmo-im/GTStoWIS2/issues/36#issuecomment-849820400, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSZR3GPPFGSIX4W2LXV3LDTP2ALZANCNFSM45MGY74Q .

petersilva commented 2 years ago

further discussion here: https://github.com/wmo-im/wis2-notification-message

amilan17 commented 1 year ago

decision in: wmo-im/wis2-notification-message#6

eliot-christian commented 1 year ago

I agree with those on this thread who assert that the link is a MUST in all cases and inline content is a MAY.

From a law and policy perspective, the CAP alert as originally published is the legal, public document that must be maintained as the permanent record of what the alerting authority sent to the public for the emergency. Therefore, the link to that CAP alert is of the essence and not merely a convenience.

I draw attention in this regard to the new CEN Workshop Agreement: "Requirements and recommendations for social media early warning messages in crisis and disaster management". It includes this requirement: "To ensure consistency, clarity and completeness of the messages disseminated simultaneously across different channels, the social media early warning messages and related notifications shall refer to the persistent unique URL of the CAP message."

Eliot Christian

On Tue, Sep 12, 2023 at 3:49 AM Anna Milan @.***> wrote:

decision in: wmo-im/wis2-notification-message#6 https://github.com/wmo-im/wis2-notification-message/issues/6

— Reply to this email directly, view it on GitHub https://github.com/wmo-im/GTStoWIS2/issues/36#issuecomment-1715180574, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSZR3ESFJLYUHKD2JWPTU3X2AHWZANCNFSM45MGY74Q . You are receiving this because you commented.Message ID: @.***>