Closed xrotwang closed 4 years ago
Yet what is good is with the forms is that we know which languages are there: So we just have to use the dialect points (their numbers). If the number is 10, dialects >= 32 have all 11.
Even better, we can link individual images:
Here, we'd need to have a factor, so the number gauchat-et-al-1925-tppsr_0112.jp2
is page 94, so we have page number + 18.
One question, if I add this information (should I call it "URL"? or Scan
?), we have a rather lengthy URL, but we only need to edit one part there. So it would make most sense to add this URL parameter to the CSVW-metadata, right? Is this possible through pylexibank? For now, I suggest, I'll just add the information in form of a path as 0XXX
, which would then resolve to the template https://ia801505.us.archive.org/BookReader/BookReaderImages.php?zip=/28/items/gauchat-et-al-1925-tppsr/gauchat-et-al-1925-tppsr_jp2.zip&file=gauchat-et-al-1925-tppsr_jp2/gauchat-et-al-1925-tppsr_{0XXX}
.jp2&id=Z2F1Y2hhdC1ldC1hbC0xOTI1LXRwcHNy&scale=5`
Hm. I'd say we put the page number in the CLDF data,
Source
column of forms.csv
, e.g. Gauchat1925[94]
Scan
with value 112
and specify a valueUrl
for this column, specifying the URL template.Okay, I will proceed with the data, working towards a PR for today (also including syllable templates), and I'll need you to show me how to specify the valueURL, but that is a small problem, I guess.
To specify the valueUrl
property, you'd something like
from csvw.metadata import URITemplate
# in cmd_makecldf:
args.writer.cldf[`FormTable`, `Scan`].valueUrl = URITemplate('https://ia801505.us.archive.org/BookReader/BookReaderImages.php?zip=/28/items/gauchat-et-al-1925-tppsr/gauchat-et-al-1925-tppsr_jp2.zip&file=gauchat-et-al-1925-tppsr_jp2/gauchat-et-al-1925-tppsr_0{Scan}.jp2&id=Z2F1Y2hhdC1ldC1hbC0xOTI1LXRwcHNy&scale=5')
The concept list at https://github.com/concepticon/concepticon-data/blob/master/concepticondata/conceptlists/Gauchat-1925-480.tsv has page numbers which can now be resolved to scans of the pages, e.g. https://archive.org/details/gauchat-et-al-1925-tppsr/page/92/mode/2up Thus, it would be useful to have the page numbers in the CLDF as well,
parameters.csv
,Source
column offorms.csv
- although this will require a bit of tinkering, since the forms for each concept are spread across two pages ...