Example usage of the WikibaseReconcileEdit API and some questions

toban commented 3 years ago

Hello!

I wanted to share an update on the WikibaseReconcileEdit API and how it can currently be used. In particular I'm curious to get answers to the questions at the very bottom of this post but also to get feedback on the current state of the API and your thoughts on improvements.

Example usage

Lets look at an example on how this could be done. Starting from the top of the TOML file.

okhv = "2.0"
name = "OHLOOM"
repo = "https://gitlab.com/OSEGermany/ohloom"
version = "0.10.0"
release = "https://gitlab.com/OSEGermany/ohloom/-/tags/ohloom-0.10.0"
license = "CC-BY-SA-4.0"
licensor = "Jens Meisner"
organisation = "OSE Germany e.V."
readme = "README.md"
image = "/Documentation/User_Guide/User_Guide.jpg"
documentation-language = "en-GB"
technology-readiness-level = "OTLR-5"
documentation-readiness-level = "ODLR-5"
function = "The Open Hardware Loom is a simple, hand-operated weaving loom made of wood, screws and 3D printed plastic pieces for the most part. It is simple to make and operate."
cpc-patent-class = "D03D 35/00"
tsdc = "MEC"
bom = "sBoM.csv"
manufacturing-instructions = "/Documentation/Assembly_Guide/AssemblyGuide.md"
user-manual = "/Documentation/User_Guide/UserGuide.md"
fabric-width-dim = "mm"
fabric-width = 400
outer-dimension-dim = "mm"
outer-dimension = "cube(size = [400,350,150]"

First of we need a unique identifier for the whole thing and repo looks to be the best candidate.

repo = "https://gitlab.com/OSEGermany/ohloom"

The payload could look like.

const entity = {
    "wikibasereconcileedit-version": "0.0.1/minimal",
    "statements": [
    {
        "property": reconciliationPropertyId,
        "value": "https://gitlab.com/OSEGermany/ohloom"
    },
    {
        "property": namePropertyId,
        "value": "OHLOOM"
    },
    {
        "property": functionPropertyId,
        "value": "The Open Hardware Loom is a simple, hand-operated weaving loom made of wood, screws and 3D printed plastic pieces for the most part. It is simple to make and operate."
    },
    {
        "property": documentationLanguagePropertyId,
        "value": "en-GB"
    },
    {
        "property": billOfMaterialsPropertyId,
        "value": "https://gitlab.com/OSEGermany/ohloom/-/raw/834222370f34ad2a07d0e41d09eb54378573b8c3/sBoM.csv"
    }

    ... more things here

    ]
};

const reconcile = {
    "wikibasereconcileedit-version": "0.0.1",
    "urlReconcile": reconciliationPropertyId
};

const payload = {
    reconcile: reconcile,
    entity: entity
}

The tricky part here is keeping track of the which properties correspond to what field in the TOML file. Currently the reconciliation API does not provide any additional way of looking up these based on the label of the property but expects the crawler to know which one to use.
Relying on the label to do lookups brings however some risks as these are very easy and likely to change.

We want billOfMaterialsPropertyId to be of type wikibase-item for the api to try to reconcile the url in the statement against an item.

This request would currently create two items.

Q1:
    reconciliationPropertyId: "https://gitlab.com/OSEGermany/ohloom"
    namePropertyId: "OHLOOM"
    functionPropertyId: "The Open Hardware Loom is a sim...."
    documentationLanguagePropertyId: "en-GB"
    billOfMaterialsPropertyId: Q2

Q2:
    reconciliationPropertyId "https://gitlab.com/OSEGermany/ohloom/-/raw/834222370f34ad2a07d0e41d09eb54378573b8c3/sBoM.csv"

Going a bit further down the TOML file we reach a part which are to be part of the BOM.

[[part]]
name = "Clamp Ring"
image = "/Documentation/Assembly_Guide/Parts_Print_2.jpg"
tsdc = "3DP"
source = "/3DParts/ClampRing/ClampRing.scad"
export = [
  "/3DParts/ClampRing/ClampRing.pdf",
  "/3DParts/ClampRing/ClampRing.stl"
]
material = "PLA"
outer-dimension-dim = "mm"
outer-dimension = "cylinder(h=30, r=28)"

We expect each item to be reconciled against a URL. To insert the Clamp Ring part from the above example we would need an url to make this work.

In the examples I cannot see this but maybe it could be created based on the TOML file it is a part of? We have the manifest file which could be used as a base for the URL to uniquely identify each component.

https://gitlab.com/OSEGermany/ohloom/-/raw/834222370f34ad2a07d0e41d09eb54378573b8c3/okh.toml#Clamp_Ring

Could this be used to identify the part?

If that is the case then the payload could look like this:

const entity = {
    "wikibasereconcileedit-version": "0.0.1/minimal",
    "statements": [
    {
        "property": reconciliationPropertyId,
        "value": "https://gitlab.com/OSEGermany/ohloom/-/raw/834222370f34ad2a07d0e41d09eb54378573b8c3/okh.toml#Clamp_Ring"
    },
    {
        "property": namePropertyId,
        "value": "Clamp Ring"
    },
    {
        "property": billOfMaterialsPropertyId,
        "value": "https://gitlab.com/OSEGermany/ohloom/-/raw/834222370f34ad2a07d0e41d09eb54378573b8c3/sBoM.csv"
    }

    ... more things here

    ]
};

const reconcile = {
    "wikibasereconcileedit-version": "0.0.1",
    "urlReconcile": reconciliationPropertyId
};

const payload = {
    reconcile: reconcile,
    entity: entity
}

This request would result in the following:

Q3:
    reconciliationPropertyId: "https://gitlab.com/OSEGermany/ohloom/-/raw/834222370f34ad2a07d0e41d09eb54378573b8c3/okh.toml#Clamp_Ring"
    namePropertyId: "Clamp Ring"
    billOfMaterialsPropertyId: Q2

Questions

As for questions. The most pressing things i could think of are.

The crawler would have to know what property to use in what case. This means keep some kind of mapping between the TOML specifications and and the corresponding wikibase property. There are ways we could do a lookup on the property label but since these are subject to change it wouldn't be that reliable.
How do the crawler plan to identify each component ([[part]] in the TOML file)? Given the examples i have found no unique identifier for it but suggest maybe using some kind of anchor for the url that specifies the BOM or the TOML file itself.
What is okh:Part supposed to be in the example TTL file? It's mentioned on ClampRing but never defined.
What is the plan for images or links to files within the repo? Are we supposed to support uploading images / files?

Kind regards / Tobias

lucaswerkmeister commented 3 years ago

Also, another question that I think fits in here: Currently, the reconciliation property is part of the payload of each request (see reconciliationPropertyId in const reconcile above). Another option would be to make it part of the wiki configuration. This would make the API less flexible, but probably easier to use. Do you need the flexibility of being able to specify a different reconciliation property with different requests?

moedn commented 3 years ago

Hey @toban thanks for raising these questions :) happy to answer them.

First things first: The crawler takes data from various sources (including those TOML files from GitHub/GitLab) and translates them into RDF (see the graphic here). This RDF is used as base for submitting the data to Wikibase. It also contains all identifiers (built from the raw data).

There is a TTL version of the TOML example you used that is meant to outline how data looks like in the end; it may answer some questions and clarify assumptions.

See e.g. the unique identifier in line 1:

@base <https://gitlab.com/OSEGermany/ohloom/1.0.0#> .

which is basically the repo URL + the assigned version of the piece of hardware.

The TTL is based on the ontology (linked in line 2) which also defines data types etc. I can add the TOML keys there as comments if you like.

To the questions:

As you point out, the mapping shall be defined somewhere instead of relying on lookup functions. Happy to do that :) just let me know your preferred format
We are using the repo URL to create identifiers for entities. Those URLs do not necessarily need to exist, just need to be unique. Parts could be referenced like that: <https://gitlab.com/OSEGermany/ohloom/1.0.0/clampring>, referring to the clamp ring of OHLOOM v1.0.0 from OSEGermany on GitLab.
okh:Part defines the class and is defined in the ontology which uses the prefix okh in the TTL
files will be linked, images can be linked, too. In the end, all the data is already hosted somewhere, so no need to push everything to Wikibase. However, on the frontend side at least images should be available. So the image decision may depend on whether we're using a mediawiki-based frontend or any other platform and hence wikibase as a backend

moedn commented 3 years ago

@lucaswerkmeister I struggle to imagine the consequences of this decision :) Could you provide a small example on what this would mean in practice?

lucaswerkmeister commented 3 years ago

With the current setup, you could have several different URL-type properties on a wiki – let’s say, P1 and P2. And in one request, you could specify the "urlReconcile" as P1, and the reconciliation API would look for existing items with P1 statements containing the URLs specified in the request, and create new items if no existing items are found; and in another request, you could specify the "urlReconcile" as P2, and the API would instead look for items with P2 statements containing the requested URLs. The same URLs could match different items (or, match an existing item or create a new one), depending on the reconciliation property ID specified in each request.

I imagine this might be useful when importing items from several, partially overlapping sources into the same Wikibase. You could import items from source A using the source A URL property, and items from source B using the source B URL property; and when you notice that an item was present in both sources, you merge the Wikibase items, so that they now have statements for both of those properties, and then you can continue to reconcile against those items in later requests.

moedn commented 3 years ago

Update: also on mediawiki-based frontend it's possible to use specific extensions that handle the display of images from other sources than wikimedia commons → so handling links is sufficient

moedn commented 3 years ago

@lucaswerkmeister thanks for the example! Yes, so in practice that might really happen e.g. when a project moved from a Mediawiki-based platform to GitHub or from Wikifactory to GitHub or especially when it is actually developed on GitHub (hence found there), but also found on the OSHWA registration list. I could write down assumptions to assess the similarity of 2 OSH modules then to the API knows what properties to check things for. → just tell me your preferred format (I tend to use either TTL or MD for everything in this repo) and I'll start writing :)

but maybe it's worth opening another issue for that

moedn commented 3 years ago

@toban @lucaswerkmeister feel free to close this issue once you feel all questions are clarified :heart:

toban commented 3 years ago

@moedn

Thank you for the answers marking this as resolved.

iop-alliance / OpenKnowHow

Example usage of the WikibaseReconcileEdit API and some questions #49

Example usage

Questions