openeventdata / phoenix_pipeline

Turning news into events since 2014.
MIT License
50 stars 33 forks source link

Global ID #34

Open johnb30 opened 10 years ago

johnb30 commented 10 years ago

Need to add in a global, unique ID for each event record.

ahalterman commented 10 years ago

Do you want them to be sequential/meaningful or can we just do an MD5 hash?

johnb30 commented 10 years ago

I thought about doing MD5 hashes for something like URL + date, but it might be more useful to have something sequential and meaningful. Easy answer is why not both?

ahalterman commented 10 years ago

Or just hash the text. It's fast. I'm fine with both (meaningful ID vs. definitely unique ID).

On a semi-related note, can we switch the date to YYYYMMDD rather than YYMMDD? It's closer to ISO and I think it's easier to read and it's much easier to convert into ISO later. I'm afraid that would break some things but it's something to think about.

johnb30 commented 10 years ago

I'm fine with hashing the text and putting in both sequential and fully unique ID. I'm also fine with the 8-digit date rather than the 6. I wonder if @philip-schrodt has any input on this? It should be just a matter of changing the format at https://github.com/openeventdata/phoenix_pipeline/blob/master/phox_pipeline.py#L27.