ontolex / frequency-attestation-corpus-information

OntoLex module for Frequency, Attestations and Corpus Information (draft)
https://www.w3.org/community/ontolex/wiki/Frequency,_Attestation_and_Corpus_Information
5 stars 10 forks source link

frac:corpus? #6

Open chiarcos opened 2 years ago

chiarcos commented 2 years ago

(a) it has been recently suggested to merge frac:corpus and frac:locus into a single property. how should this be named? originally, that was dc:source. (b) at the moment, frac:corpus is not obligatory for a frac:Observation. Can we assert that exactly one corpus is required?

chiarcos commented 2 years ago

Consensus now is to provide both frac:corpus and frac:locus for attestations, the former pointing to the source data as a whole, the second pointing to the specific location.

chiarcos commented 2 years ago

An open (recurring) discussion point is the naming of frac:corpus and frac:Corpus, because it led to misunderstandings in the past. In the current draft, it is explicitly and repeatedly stated that our understanding of corpus is not limited to NLP corpora[1], but this seems to be hard to communicate. An alternative solution is to abandon the notion of frac:Corpus, and instead operate with "members of the dct:DCMIType class (see https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#section-7). Then, frac:corpus can be safely superseded by dct:source, and the type of source is made clear by the DCMIType member (dcmit:Collection, dcmit:Dataset, dcmit:Text, dcmit:Image, dcmit:MovingImage, ...).

[1] definition frac:Corpus: "represents any type of linguistic data or collection thereof, in structured or unstructured format.", definiton frac:corpus: "the data in which that Observation has been made. This can be, for example, a corpus or a text represented by its access URL, a book represented by its bibliographical metadata, etc.", Notes: "non-empty collection of texts, in electronic or other form. (Note that a single text can constitute a corpus.)"