Open jracusin opened 1 year ago
That could be the record offset.
Is the record offset something set explicitly in the Unified Schema, and assigned by the producer? Or is assigned by GCN? How does one find the ID?
A Kafka record is uniquely identified by its topic, partition, and offset. With those three pieces of information, you can command a client to seek to the given record. I would suggest developing some notation combining those three fields.
Thanks Leo. But the record topic, partition, and offset is not known before submission by the producer, right? So how does one know what the partition and offset are? I have to listen to my own notices and record?
The topic is certainly known before submission. As for the partition, all of our topics currently use a single partition, although that might not always be the case.
I would think that there is probably a way for a producer to get the offsets of records it has sent shortly after they are flushed.
Why does your producer need to know the offsets of records it has sent?
Need to know the notice ID for reference purposes. For example, in our retraction notice type, it would be useful to be able to reference the ID of the notice we are retracting. etc. If you are suggesting that this notice ID be made up partly, or entirely, of the record offset....then need to know this. Happy to avoid, if you think unnecessary. But it seems generally useful to be able to reference a particular notice directly.
Kafka records can also have keys.
I suggest that you study the Streams Concepts page, particularly the parts on keys, partitions, and timestamps.
@lpsinger @jracusin i don't care what GCN uses as the unique notice identifier, as long as it exists. You suggested using the offset. I'm also open to keys, I don't really care, as long as it is clear how to work with it. Let me know when you have chosen a method.
We do not yet have a design for a unique notice identifier. I am just leaving this as background reading.
@Tohuvavohu a quick update on this: I made a PR for the gcn.nasa.gov that updates the sample code to print out the offset number.
If you want to print it in a consumer, you can add print(f'{message.topic()}: #{message.offset()}')
to the consuming loop.
You can use the offset and topic to directly reference specific Notices. Here is an example of retrieving a gcn.classic.text.SWIFT_ACTUAL_POINTDIR notice using the specific offset number: 33893 (I got this number yesterday using the message.offset() example). The python gcn-kafka library is a wrapper around confluent_kafka, so you should already have the package installed
from gcn_kafka import Consumer
from confluent_kafka import cimpl
# Connect as a consumer.
# Warning: don't share the client secret with others.
consumer = Consumer(client_id='your-client-id',
client_secret='your-client-secret')
topic = "gcn.classic.text.SWIFT_ACTUAL_POINTDIR"
pt = cimpl.TopicPartition(topic, 0, 33893)
consumer.commit(offsets=[pt])
consumer.subscribe([topic])
for message in consumer.consume(num_messages=1):
value = message.value()
print(f'{message.topic()}: {message.offset()}')
print(value)
Thanks @dakota002 ! :) Definitely should be in the documentation for users (i see that in your PR now)
Do you think it makes sense to append the ID number to the alert packet itself? Then people who save the alert.json on receipt will be able to reference the ID. Don't know if this is feasible given how the offset number is assigned. Thoughts?
I would say it technically is. If you consider the alert packet as the whole message object, and keep in mind that the JSON is just the value
. I definitely agree though that this info should be more visible in our documentation. Here is some more information on the Message class
Right, but can it be appended to the value
object itself? I imagine many will want to save the value of the alert to a json file, and being able to consistently reference and find the ID would be very useful for users, I think.
You don't really know the offset until the Kafka broker has ingested the record. So that's not really possible.
figured that might be the case, thanks.
Just to clarify: this is what's intended to be included when one uses the "reference" the gcn-schema/core/FollowUp schema, like:
"reference": { "gcn.notices.LVK.alert": 6666 },
Additionally, confirming there's no ability to access this if you're using older GCN interfaces, correct?
In order to be able to cite notice messages individually, we need to add a unique identifier for each message in the producer class. This could include a unique string for each producer than a alphanumeric string for each message.