Closed MansMeg closed 10 months ago
I suggest we use firstname_lastname_yyyymmdd
(birthdate). It is static given that the primary name of the person and the birthdate don't change, and for the most part they shouldn't. I have also checked that there are no conflicts. On the other hand, only using birthyear leads to a handful of conflicting IDs.
If the birthday isn't available, we would use firstname_lastname_yyyymmXX
or firstname_lastname_yyyyXXXX
.
People change names so this might be confusing long term. Maybe just use a uuid? That we know will persistent.
I would say you should have id:s for everything parties/PM members/departments/electoral districts/subjects/.... and do like Wikidata just an id with no meaning (Q is from the name of Dennys wife Qamarniso Q61768970)
Swedish Riksdagen has a solution were just the last part the GUID is a Slug - the rest just makes the URL more "user-friendly" or complex đŁ
ulf-kristersson is just to make it human readable
Another lesson learned is support redirects ---> When e.g. #88 Riksdagens does mistakes and adds 2 id:s for the same person (and never fix it đą ) its easy you also get "2 people" --> they should be merged on your side and IF the end user still have the "old id" they should find the merged target..,.. --> owl:sameAs
That sounds like a good idea. Best of both worlds. =)
Why are the wiki_ids not persistent? It seems like the least expensive solution (for us, since we used the QIDs in protocol documents) would be to convince wikidata to make the QIDs persistent.
@salgo60 know this better than me. But I think the core problem is that anyone can create a new person (hence a new id). This can then be merged. So it is a âflawâ of the wikidata structure.
In addition, wikidata would like us to have persistant id that they could reference to. Ie our corpus will (after 1.0) be a reference for the quality control of wikidata.
I hope this explains why.
@MansMeg @ninpnin maybe its time for starting the process of getting persistant unique Welfare state analytics ids #269
See how Nobelprize.org redesigned its data with an API and then @miroli proposed a Wikidata id P8024 --> we can now access the WD object using the Nobelprize unique id...
@salgo60 know this better than me. But I think the core problem is that anyone can create a new person (hence a new id). This can then be merged. So it is a âflawâ of the wikidata structure.
In addition, wikidata would like us to have persistant id that they could reference to. Ie our corpus will (after 1.0) be a reference for the quality control of wikidata.
I hope this explains why.
I would say that Wikidata is not designed to be the source and its better as I describe above that you have an unique persistent id as the update frequency in WD is crazy and its an open system with its strengths and weakness... also supporting > 200 languages make this equation nearly impossible and we merge a lot - see real time stream
The design as I understand it is not about the truth more what other sources claim --> Wikidata can also store contradicting facts...
@MansMeg @ninpnin @fredrik1984 @liamtabib
We discussed persistent IDs this morning. There's already an open issue, so I didn't want to start a new one. Regardless of the format we use for the IDs, it seems like we need to obtain/create a property item on wikidata, something like SWERIK_MP_ID. According the this such an needs to be proposed and discussed "for some time" before it can be approved --- do we know @salgo60 if it's already been proposed and/or how long is "some time"? Maybe we should decide on the property name and propose it ASAP if it hasn't been done already.
There has been discussion about whether to use name/birth date or a uuid. I see the sense in using a UUID, but also sense in having a deterministic ID -- I suggest that we create a UUID deterministically using the primary name/surname and birth date as a seed (we can use pyriksdagen.utils.get_formatted_uuid as a starting point) -- best of both worlds?.
What do you all say?
Good idea!
That works for me. The only important thing is that the IDs are persistent. I.e. we need to commit to the IDs, and they will never change after they are assigned to an individual. How we create them is less important, as long as it is uuids.
I think the discussions on Wikidata will be less of a problem if we set up a persistant id, since these IDs will probably be the only persistent ids for MPs going far back in time.
WD need a formatter string and some examples
See how a proposal looks like that I created 11:39, 21 September 2016
https://www.wikidata.org/wiki/Wikidata:Property_proposal/SBL
Anyone can create a proposal and everyone can comment and vote on it.... my experience is that it takes some weeks to get it approved...
I am out kayaking this week and can help you when I am back but it is no rocket science so give it a try...
One thought I had if we could use Liberis-URI or the one Riksdagens has dependent were you will store your data
Would be nice if you had landing pages --> we could link you from Swedish Wikipedia
objects like
It's easy extracting text and pictures from Swedish Wikipedia see examples I did for people doing an app with Swedish cemeteries
Would be interesting if you shared you experience as researcher's how you experience working with Wikidata see tweet what is missing and can be better...
UPDATE: Wikidata modelling days 2023 looks like a researcher Daniel Mietchen is part he is also involved in designing Scholia see video
I'll draft a text for the Motivation part of the wikidata proposal in the next couple of days and post it here for commentary before submitting it. I think there's one unsettled issue, though. There's some consensus on using a UUID solution, but do we want to add some kind of human readable segment so it's clear that these are our UUIDs? E.g.: "SWERIK-6a28a4b0-8f46-4134-a88e-2645b704c9fc
" or similar? @salgo60 @ljo any thoughts or best-practices around this?
1) unique is the key and and a having a human readable string maybe Will add value or just complexity đ
Extra bonus can be done when approved a) a regular expression Property:P1793 --> we can easy catch wrong edits
^SWERIK-[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$
b) URL match pattern Property:P8966 we have tools using the URL to understand what Wikidata property it relates to eg. ^http?:\/\/(?:www.)?fossilworks.org\/cgi-bin\/bridge.pl\?a=taxonInfo&taxon_no=(([1-9]\d{0,5})) relates to Property:P842 c) stability of property value Property:P2668 d) formatter URI for RDF resource Property:P1921 e) Property constraints wikidata has the possibility to add rules as unique see Help:Property_constraints_portal) f) will this PID also support lexemes? Wikidata has > 41000 swedish lexemes see example riksdagen g) owned by Property:P127 h) issue tracker URL Property:P1401 i) user manual URL Property:P2078 j) always nice to understand how its used see used by Property:P1535 I hope those PIDs will be used by Riksarkivet, Riksarkivet SBL, RAĂ, LIBRIS, Europeana, Riksdagens open data..... h) API endpoint URL Property:P6269 i) SPARQL endpoint Property:P5305 .....
Would be cool if we could do linked data of your Push release tests we have Software_quality_assurance property = Property:P2992
that maybe could be used for adding all the tests you do --> ** we then create Q numbers for a test like check in Wikidata that Swedish PM people does not
OT WIkidata has started to release Wikifunctions video and 2023-10-25 it was released Running on WebAssembly
Good document about persistent identifiers and see also my "The Magnus list" created 2021 "One way to design a system to be a good external identifier in Wikidata" this list was mentioned by David Shorthouse at 27:50 in the Stanford video - slides "Keepin 'N Sync... with wikidata ... and ORCID...and GBIF"
see above PID document
I have also tried to get Riksarkivet to support archived documents and PIDs --> status work in progress :sad::sob: maybe your project can explain that PIDs support in archives are very important for research people
Today, I perceive that there is no one else on the line when it comes to discussing persistent identifiers and how they should be supported in archives. DIGG's project does not seem to firmly decide that the National Archives and the Royal Library (KB) should handle this.
but do we want to add some kind of human readable segment so it's clear that these are our UUIDs
@BobBorges doi.org/10.1101/117812 states in Lesson 3. Opt for simple, durable web resolution
Trailing characters after the local ID
are discouraged as they unnecessarily increase the variability with which the identifier is represented
and also complicate straightforward appending of the local ID
I think going with a pure uuid is probably the simplest. I dont see the value of adding swerik as a slug. Ideally the pid will live longer (with the vorpus) than with the swerik project name.
@MansMeg Isnt SWERIK used for every PID? That I feel is not a problem maybe make it easier to understad the context of the PID ... the problem I see is when doing as Riksdagen then you get problems not knowing if you find the some PID...
I hope we in Sweden will move i direction creating our resolving service something lika a Swedish DOI maybe SWEDOI
Maybe related I read this paper Introducing Innovative Indicators to Track Sweden's Open Research Data Objective: How to Measure Progress? Defining Indicators to Track Open Research Data Across Swedish Universities
I thinks loosely coupled systems should implement the observer pattern so that you can maybe easier show citation graphs - see my suggestion to DIGG people "Best practice needed for understanding who is referencing my PID" and "#17 Vem anvÀnder en identifierare"
I see that point. But I doubt the swerik name will live long enough. Whatever slug we use we will have this or similar problems. Just going with a uuid is probably the easiest minimal viable uuid and would have the least long term risks, I think.
There's some motivation for a persistent SWERIK person ID here: https://docs.google.com/document/d/10_SEVNI7dF46hhnucTps242ntSr1nm_R3EHC7_9Mkjk/edit?usp=sharing
Modeled on @salgo60's example in scope/length/level of detail. Feel free to add any commentary directly to that google document.
This is excellent @BobBorges !
I will read and comment. I think this is an issue that I think we can discuss now, and then have a discussion with the TAB next Friday as a last pair of eyes before we go forward and implement.
I think one good motivation is with your own persistent identifier you can VERY easy start use SKOS and explain a difference with Wikidata, Riksdagens Oppna data, Riksarkivet SBL, the book "TvÄkammar Riksdagen".....
There's some motivation for a persistent SWERIK person ID here:
@BobBorges The best motivation I feel is FAIRDATA F1 as you produce research data ut should be FAIRDATA.
Principle F1 is arguably the most important because it will be hard to achieve other aspects of FAIR without globally unique and persistent identifiers
see also DOI 10.1101/117812
Other good resources
Thanks @salgo60! FAIR is a good thing to mention in the motivation. As someone with a research background, the R in FAIR seems the most problematic in our case now without persistent IDs -- How can we reuse and verify research findings when the primary keys of our database change regularly?
@BobBorges as Wikidata addictive I also would like to see the provenance - PROV of every singel data point i.e. something like a more advanced version history combined with the role of who did the change.... I.e what trust does the agent has and what data is that change based on... I feel we see that problem with "party" vilde #139 and chatGPT using PROV
One antipattern I see in Wikidata that "every" source should confirm the birth of Selma Lagerlöf Q44519#P569 right now 23 references
The Wikidata model lack a Trust dimension. I asked Denny the WD designer of his point of view and wrote a blogpost about it WikidataCon 2019: We need a better model communicating quality/relevance of sources in Wikidata / Provenance
I did a small test using PROV with chatGPT and also show how good change tracking SPA Svensk PortrÀttarkiv has when you use the API link 139#issuecomment-1806804671
If you have a Wiki account donât hesitate to support it syntax
{{s}} - ~~~~
@miroli @monirbounadi
https://www.wikidata.org/wiki/Wikidata:Property_proposal/SWERIK_Person_ID
@BobBorges I heard comments from your statement
Wikidata IDs, however, are dynamic, and with each update, a handful of errors occur due to mismatched IDs in the dynamic database and static quality control files
As said before more times should I show you WD? What can happen is that 2 ids are mergedâŠ
A merge will have an redirect from the old to the new⊠and if we speak semantics SKOS exactMatch
the problem with Wikidata is that most people are not domain experts and as itâs an open system we also get anonymous edits and vandalismâŠ.
I understand the reason for changes -- our issue is that part of our work involves static files, e.g. manually curated, theoretically correct data with sources, that we want to check against info extracted with new queries to wikidata.
Do I need to do something more with this, or your edit is enough?
@BobBorges wait and see we now have enough people I guess to get this approved⊠next step is to get the focus of a wiki admin which could take 1 minute or more weeks :sad:
FYI: I added P12192 to Template:Sweden_properties / diff and Template:Politician_properties / diff
Feels like its wrong set up I guess you will have persistent identifiers for everything not just people as P31 indicates
@BobBorges can we close this?
There is a need from the wikidata people to refer to our corpus (from version 1.0) as a reference on the data. Hence we should make our ids persistent.