lexibank / tppsr

Tableaux Phonétiques des Patois Suisses Romands
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Add page numbers to the CLDF dataset #9

Closed xrotwang closed 4 years ago

xrotwang commented 4 years ago

The concept list at https://github.com/concepticon/concepticon-data/blob/master/concepticondata/conceptlists/Gauchat-1925-480.tsv has page numbers which can now be resolved to scans of the pages, e.g. https://archive.org/details/gauchat-et-al-1925-tppsr/page/92/mode/2up Thus, it would be useful to have the page numbers in the CLDF as well,

LinguList commented 4 years ago

Yet what is good is with the forms is that we know which languages are there: So we just have to use the dialect points (their numbers). If the number is 10, dialects >= 32 have all 11.

LinguList commented 4 years ago

Even better, we can link individual images:

https://ia801505.us.archive.org/BookReader/BookReaderImages.php?zip=/28/items/gauchat-et-al-1925-tppsr/gauchat-et-al-1925-tppsr_jp2.zip&file=gauchat-et-al-1925-tppsr_jp2/gauchat-et-al-1925-tppsr_0112.jp2&id=Z2F1Y2hhdC1ldC1hbC0xOTI1LXRwcHNy&scale=5

Here, we'd need to have a factor, so the number gauchat-et-al-1925-tppsr_0112.jp2 is page 94, so we have page number + 18.

LinguList commented 4 years ago

One question, if I add this information (should I call it "URL"? or Scan ?), we have a rather lengthy URL, but we only need to edit one part there. So it would make most sense to add this URL parameter to the CSVW-metadata, right? Is this possible through pylexibank? For now, I suggest, I'll just add the information in form of a path as 0XXX, which would then resolve to the template https://ia801505.us.archive.org/BookReader/BookReaderImages.php?zip=/28/items/gauchat-et-al-1925-tppsr/gauchat-et-al-1925-tppsr_jp2.zip&file=gauchat-et-al-1925-tppsr_jp2/gauchat-et-al-1925-tppsr_{0XXX}.jp2&id=Z2F1Y2hhdC1ldC1hbC0xOTI1LXRwcHNy&scale=5`

xrotwang commented 4 years ago

Hm. I'd say we put the page number in the CLDF data,

LinguList commented 4 years ago

Okay, I will proceed with the data, working towards a PR for today (also including syllable templates), and I'll need you to show me how to specify the valueURL, but that is a small problem, I guess.

xrotwang commented 4 years ago

To specify the valueUrl property, you'd something like

from csvw.metadata import URITemplate

# in cmd_makecldf:

    args.writer.cldf[`FormTable`, `Scan`].valueUrl = URITemplate('https://ia801505.us.archive.org/BookReader/BookReaderImages.php?zip=/28/items/gauchat-et-al-1925-tppsr/gauchat-et-al-1925-tppsr_jp2.zip&file=gauchat-et-al-1925-tppsr_jp2/gauchat-et-al-1925-tppsr_0{Scan}.jp2&id=Z2F1Y2hhdC1ldC1hbC0xOTI1LXRwcHNy&scale=5')