wetneb / openrefine-wikibase

This repository has migrated to:
https://gitlab.com/nfdi4culture/ta1-data-enrichment/openrefine-wikibase
Other
100 stars 24 forks source link

Fetch qualifiers #72

Open wetneb opened 4 years ago

wetneb commented 4 years ago

There is currently no way to fetch qualifiers in the data extension API (or to refine during reconciliation). A syntax for such qualifiers should be picked and implemented.

antoine2711 commented 4 years ago

Yes, that would be great. And we should be able to link the column for the fetch data in the WD schema for pushing back data.

wetneb commented 4 years ago

I am not sure what you mean by "link the column". Do you mean using column groups? I don't see how column groups can be relied on in the WD schema.

antoine2711 commented 4 years ago

What I meant is that if I could query quantifiers and references, than, they can also be push back. This makes a round trip (get the data, fill the blanks, push the data back).

Now, this can't be done since quantifiers and references can't be imported before.

hughlilly commented 3 years ago

Would this be why I'm having this issue? Sorry if the terminology is off -- perhaps I should have said "qualifier" instead of "flag" in the subject line…

wetneb commented 3 years ago

No your issue is not linked to qualifiers - but it's also an interesting one, I replied there :)

wetneb commented 2 years ago

Use case mentioned here by @mshd:

I would like to reconcile Wikidata with a certain qualifier. Is it that possible, if not, could you implement it?

Exampl

Screenshot from 2021-11-23 14-47-56 Set qualifier property to North Sumatera III. or give me all people which ever had a candidacy at this district.

Pluralog commented 2 years ago

I would love it. In my usecase I have annual data like "total revenue" and without fetching qualifiers it's really difficult to update only those with no data from a certain year.

wetneb commented 2 years ago

Let me expand on the design questions that need to be resolved before this can be implemented. This issue can be understood in multiple ways:

  1. I want to fetch the qualifier values on all statements of a given property. For instance: give me all the years for which the total revenue is available on Wikidata.
  2. I want to fetch the qualifier values on statements of a given property with a given value. For instance, give me the "member of political party" qualifier of the "candidacy in election: 2014 Indonesian People's Representative Council election" statement.
  3. I want to fetch main statement values, but select the ones I care about by specifying qualifier values. Example: give me the total revenue of this company in 2018 (so, filtering all "total revenue" statements to only keep the ones with a "point in time":2018 qualifier).
  4. I want to fetch "candidacy in election" statements, fetching simultaneously the main statement value and the qualifier values, representing them in OpenRefine with a record-like structure. This seems difficult to implement in a natural way with the current protocol.

Possible syntaxes we could add to support these use cases (where P3602 is candidacy in election, P1111 is votes received and P768 is electoral district):

  1. P3602#P1111 (all P1111 qualifiers on all P3602 statements)
  2. P3602=Q108816797#P1111 (all P1111 qualifiers on P3602=Q108816797 statements)
  3. P3602[P768=Q96984689] (all main statement values on P3602 statements with P768=Q96984689 qualifier)
  4. I do not see a clean way to implement this given the existing API.

Do you see other use cases not covered by these points? Which of those use cases would be useful to you?

Pluralog commented 2 years ago

Do you see other use cases not covered by these points? Which of those use cases would be useful to you?

Looks good to me.

Only if the qualifiers are not Items themselves, case 3 could look more complicated. I.e. in case of point in time, which could just be the year, but sometimes is a certain data. In wikidata I would use FILTER for the qualifier. As a workaround we could use case 1 and do the filtering in Open Refine later.

wetneb commented 2 years ago

As a workaround we could use case 1 and do the filtering in Open Refine later.

The problem with 1. is that it would only fetch the qualifier values, not the main statement values, so it is not clear to me how you can use it to reimplement 2 or 3 by adding local filtering afterwards.

antoine2711 commented 2 years ago
  1. P3602[P768=Q96984689] (all main statement values on P3602 statements with P768=Q96984689 qualifier)

  2. I do not see a clean way to implement this given the existing API.

Do you see other use cases not covered by these points? Which of those use cases would be useful to you?

@wetneb : fine for 1. and 2. But why not P3602#P768=Q96984689 for 3.? And for 4.: why not Pxxx#*?

Regards, Antoine

Pluralog commented 2 years ago

As a workaround we could use case 1 and do the filtering in Open Refine later.

The problem with 1. is that it would only fetch the qualifier values, not the main statement values, so it is not clear to me how you can use it to reimplement 2 or 3 by adding local filtering afterwards.

I thought it would only work in multiple steps. In my case (total revenue and point in time) I would try:

  1. fetch all point in time values for total revenue
  2. filter in Open Refine all point in time values between 2017-00-00 and 2018-00-00
  3. fetch all main statements for those

But you are right. It would only work if I could use the values of a column as qualifiers in my query.

wetneb commented 2 years ago

@antoine2711 for 4., the problem is not to find a syntax for it, but rather to see how it would fit in the protocol. At the moment, when the user requests a property, we can only return one column for it.

wetneb commented 2 years ago

I guess one hacky workaround would be to let the user fetch the full JSON of the statements, and we would let them manipulate that themselves in OpenRefine. After all, there is a ton more fields we are not exposing (ranks, references…) and it is unlikely we can find a satisfactory syntax to fetch all those fields, so it would be good to have this fallback option for power users.

It would still be more convenient than having to query the Wikibase API directly.

antoine2711 commented 2 years ago

The problem with 1. is that it would only fetch the qualifier values, not the main statement values

Oh! I see @wetneb. So, the problem is bring the structure in OR? Why couldn't 2 columns be brought at the same time? I understand it requires creating rows at 2 levels, the outer statements and the inner qualifiers. But still, is that so complicated?

Also, OR has a (not very functional) grouping of column, like what you get from importing XML or JSON. Could that mechanism be reused?

I write that because, for me, in all 4 scenarii, I would like the statement value AND the qualifier's property AND the value of the qualifier's property.

Regards, Antoine

wetneb commented 2 years ago

All I can say is that I do not know how that should be implemented. Again, proposals and pull requests are welcome.

antoine2711 commented 2 years ago

I guess one hacky workaround would be to let the user fetch the full JSON of the statements, and we would let them manipulate that themselves in OpenRefine.

That would be great in many ways. Because, we could expand the syntax to add @ and the source property, with the same logic.

For the access of that data, since all those query starts from a recon column, maybe add fields to the recon...

Or, in the new column, save the data as a new recondata object. It would save either recon or values, and the cell of the initial recon column (the element of the statement).

In the same logic, we could want to have columns of reconcialied property that could replace properties in the Wikidata schema.

So the recondata could have a type of statement value, statement property, qualifier property or qualifier value, source property, or source value.

Expanding this logic seams quite in phase with the wikibase généralisation (though another topic).

Sorry @wetneb and the others if I am OT with too much OpenRefine, it's just here the two are so link/dependant of each other in my view.

Regards, Antoine

trnstlntk commented 2 years ago

I have just received a request via email from another user who would find this very helpful.

It would be very useful for data extension for Wikimedia Commons' structured data, as P170 is usually described with several qualifiers there.

VojtechDostal commented 2 years ago

I have just received a request via email from another user who would find this very helpful.

It would be very useful for data extension for Wikimedia Commons' structured data, as P170 is usually described with several qualifiers there.

That user is me :-) . I like @wetneb 's solution to enable loading full statement JSONs. This would solve many possible feature requests in one go :)