Closed toban closed 3 years ago
Also, another question that I think fits in here: Currently, the reconciliation property is part of the payload of each request (see reconciliationPropertyId
in const reconcile
above). Another option would be to make it part of the wiki configuration. This would make the API less flexible, but probably easier to use. Do you need the flexibility of being able to specify a different reconciliation property with different requests?
Hey @toban thanks for raising these questions :) happy to answer them.
First things first: The crawler takes data from various sources (including those TOML files from GitHub/GitLab) and translates them into RDF (see the graphic here). This RDF is used as base for submitting the data to Wikibase. It also contains all identifiers (built from the raw data).
There is a TTL version of the TOML example you used that is meant to outline how data looks like in the end; it may answer some questions and clarify assumptions.
See e.g. the unique identifier in line 1:
@base <https://gitlab.com/OSEGermany/ohloom/1.0.0#> .
which is basically the repo URL + the assigned version of the piece of hardware.
The TTL is based on the ontology (linked in line 2) which also defines data types etc. I can add the TOML keys there as comments if you like.
To the questions:
<https://gitlab.com/OSEGermany/ohloom/1.0.0/clampring>
, referring to the clamp ring of OHLOOM v1.0.0 from OSEGermany on GitLab.okh:Part
defines the class and is defined in the ontology which uses the prefix okh
in the TTL@lucaswerkmeister I struggle to imagine the consequences of this decision :) Could you provide a small example on what this would mean in practice?
With the current setup, you could have several different URL-type properties on a wiki – let’s say, P1 and P2. And in one request, you could specify the "urlReconcile"
as P1, and the reconciliation API would look for existing items with P1 statements containing the URLs specified in the request, and create new items if no existing items are found; and in another request, you could specify the "urlReconcile"
as P2, and the API would instead look for items with P2 statements containing the requested URLs. The same URLs could match different items (or, match an existing item or create a new one), depending on the reconciliation property ID specified in each request.
I imagine this might be useful when importing items from several, partially overlapping sources into the same Wikibase. You could import items from source A using the source A URL property, and items from source B using the source B URL property; and when you notice that an item was present in both sources, you merge the Wikibase items, so that they now have statements for both of those properties, and then you can continue to reconcile against those items in later requests.
Update: also on mediawiki-based frontend it's possible to use specific extensions that handle the display of images from other sources than wikimedia commons → so handling links is sufficient
@lucaswerkmeister thanks for the example! Yes, so in practice that might really happen e.g. when a project moved from a Mediawiki-based platform to GitHub or from Wikifactory to GitHub or especially when it is actually developed on GitHub (hence found there), but also found on the OSHWA registration list. I could write down assumptions to assess the similarity of 2 OSH modules then to the API knows what properties to check things for. → just tell me your preferred format (I tend to use either TTL or MD for everything in this repo) and I'll start writing :)
but maybe it's worth opening another issue for that
@toban @lucaswerkmeister feel free to close this issue once you feel all questions are clarified :heart:
@moedn
Thank you for the answers marking this as resolved.
Hello!
I wanted to share an update on the WikibaseReconcileEdit API and how it can currently be used. In particular I'm curious to get answers to the questions at the very bottom of this post but also to get feedback on the current state of the API and your thoughts on improvements.
Example usage
Lets look at an example on how this could be done. Starting from the top of the TOML file.
First of we need a unique identifier for the whole thing and repo looks to be the best candidate.
The payload could look like.
The tricky part here is keeping track of the which properties correspond to what field in the TOML file. Currently the reconciliation API does not provide any additional way of looking up these based on the label of the property but expects the crawler to know which one to use.
Relying on the label to do lookups brings however some risks as these are very easy and likely to change.
We want
billOfMaterialsPropertyId
to be of typewikibase-item
for the api to try to reconcile the url in the statement against an item.This request would currently create two items.
Going a bit further down the TOML file we reach a part which are to be part of the BOM.
We expect each item to be reconciled against a URL. To insert the Clamp Ring part from the above example we would need an url to make this work.
In the examples I cannot see this but maybe it could be created based on the TOML file it is a part of? We have the manifest file which could be used as a base for the URL to uniquely identify each component.
Could this be used to identify the part?
If that is the case then the payload could look like this:
This request would result in the following:
Questions
As for questions. The most pressing things i could think of are.
The crawler would have to know what property to use in what case. This means keep some kind of mapping between the TOML specifications and and the corresponding wikibase property. There are ways we could do a lookup on the property label but since these are subject to change it wouldn't be that reliable.
How do the crawler plan to identify each component ([[part]] in the TOML file)? Given the examples i have found no unique identifier for it but suggest maybe using some kind of anchor for the url that specifies the BOM or the TOML file itself.
What is
okh:Part
supposed to be in the example TTL file? It's mentioned on ClampRing but never defined.What is the plan for images or links to files within the repo? Are we supposed to support uploading images / files?
Kind regards / Tobias