usc-isi-i2 / t2wml

Table to Wikidata Mapping Language
MIT License
22 stars 11 forks source link

Support Ethiopian Calendar #193

Closed kyao closed 3 years ago

kyao commented 4 years ago

If in the yaml the calendar is Q215271, the value is in Ethiopian calendar format. Convert the value to Gregorian calendar.

devowit commented 4 years ago

you've mentioned a specific library that does conversions of ethiopian to gregorian, could you link me it please? thank you

kyao commented 4 years ago

Here is the python library: https://github.com/dimagi/ethiopian-date-converter

devowit commented 4 years ago

ok so if calendar: trait is set to Q215271 then use the converter? any other things it could be set to that would invoke the converter?

devowit commented 4 years ago

I just tried to do pip install ethiopian_date as described in their README, and got an error. it looks like there's something wrong with the package. Were you able to successfully install/run it?

Nevermind, the error is just in their setup.py, given the GPL license, I'm just goign to copy the code directly in rather than add a pip dependency. Checked and is working. So just need to know exactly when you want it applied.

kyao commented 4 years ago

Just the calendar trait, if it is set to Q215271 then use the converter.

    qualifier:
    - calendar: Q215271
szeke commented 4 years ago

Is the idea that the output will contain the data in Gregorian format? Will it have both dates? I think we need a date conversion setting in T2WML with values: keep original, replace with Gregorian, add Gregorian

kyao commented 4 years ago

How about always convert to Gregorian, and then also have the option of keeping the original?

Not having a Gregorian calendar date would make more querying difficult. We would have to check of the calendar type before querying.

szeke commented 4 years ago

Ok, we can have replace with Gregorian as the default. T2WML is a general purpose tool, so we need to implement features in a generic way.

kyao commented 4 years ago

KGTK does not directly support calendar. How about introducing a new property PCalendar?

Below is a statement e1 with two point in time P585 qualifiers. The first P585 statement is qualified with the the Ethiopian calendar Q215271 and the second is qualified with Gregorian calendar Q12138.

id      node1   label   node2
e1      Q3513939        PVARIABLE-QEth-census-partial-002       1089.21
e2      e1      P585    ^2012-13-01T00:00:00/9
e3      e2      PCalendar       Q215271
e5      e1      P585    ^2020-09-06T00:00:00/9
e6      e5      PCalendar       Q12138
szeke commented 4 years ago

@kyao There is no need for a calendar property. The exploded file has the following columns, the calendar goes in node2;calendar

(base) D22ML-PSZEKELY:wikidata-20200803-v2 pedroszekely$ gzcat wikidata-20200803-all-edges.tsv.gz | head
id  node1   label   node2   rank    node2;magnitude node2;unit  node2;date  node2;item  node2;lower node2;upper node2;latitude  node2;longitude node2;precision node2;calendar  node2;entity-type   node2;wikidatatype
devowit commented 4 years ago

Ok, I am now fully lost. Given the discussion above, what is the feature we want implemented?

kyao commented 4 years ago

Here is my understanding.

    qualifier:
      - calendar: Q215271
        date conversion: keep original

The T2WML snippet above says that the dates in the data is in Ethiopian calendar format. And, T2WML should generate two date qualifiers for each statement. One date qualifier in Gregorian, and one date qualifier in Ethiopian calendar. The exploded tsv should look something like below. Edge e2 is date qualifier in Gregorian, and Edge e3 is qualifier in Ethiopian.

id      node1   label   node2   node2;date      node2;calendar
e1      Q3513939        PVARIABLE-QEth-census-partial-002       1089.21
e2      e1      P585            2020-09-06T00:00:00     Q12138
e3      e1      P585            2013-13-01T00:00:00     Q215271

The other date conversion setting is replace with Gregorian. For this setting Edge e3 will not be generated.

If no date conversion is specified, then the default behavior is replace with Gregorian.

zmbq commented 4 years ago

The terms are a little confusing to me. "replace with Gregorian" makes sense, but it's not intuitive for me that "keep original" keeps the original AND converts to Gregorian. I would expect "keep original" to just keep the original, without creating an additional edge. Also, there is no option for keeping the original without creating an additional edge.

Do note that as it stands, "keep original" will cause Datamart to malfunction, returning two rows for each data point, as we join on the P585 label. We need to figure out how to handle multiple P585 edges in Datamart.

szeke commented 4 years ago

Keep original means to do nothing, ie no changes to the user input.

I was thinking that date conversion is a global setting, not something that the user has to put in the YAML.

I agree with @zmbq that add Gregorian will cause the queries to return two items, so we should not use it in Datamart, but it still makes sense to have the option

devowit commented 3 years ago

@kyao can i have an example project for this?

devowit commented 3 years ago

should be supported in 2.3.10, close if working as expected

devowit commented 3 years ago

assuming no response for months means it's been dealt with, closing