Open stuartpb opened 8 years ago
I feel like "related words" is just a fuzzy hedge between a shared tag and a direct word relation. If it isn't one of those, would it really be worth noting?
I think there is a chasm here, though, and making "compare" and "contrast" lists could be worth it. However, I want to stick with just tags and directly-defined relationships for now (as I really do see all useful relationships coming through via shared tags, possibly using combinations of shared tags).
Isn't my "rel" example broken by the Link header in HTTP?
division
right now:
- use: A division of an organization.
tags: [metatag]
compare: [department]
- use: Mathematical division.
tags: [math]
I think there should be a sectioning
metatag that would apply to both of these - while it wouldn't directly draw the division/department correlation, it would draw other similar correlations that would be similarly applicable to whatever would come across division
.
On the other hand, why do I even have this? I can't think of any format offhand that uses "division" as a keyword like this. I think I added it for CMU's Computer Emergency Response Team, which is included as a sense of "cert" for some reason. Even if I don't cut that, it should be tagged as "organization", not "division".
In conclusion, I'm dropping that metatag altogether.
I also have it as a two-way link between "nop" and "noop"... I think them having the same wikidata identifier should be enough to do something with that.
I'm keeping the sense of division, just because it's worth noting synonyms like that... but the way that "div" is way more common, eeh.
I just started writing this in the schema documentation, but now I think I want to cut it:
also
An array of synonyms that this sense embodies (see "Synonym fields" below).
This field should only be used when the sense has its own prominent usage, separate from the word it is a synonym of. Otherwise, the synonym field should be applied to the entire sense, and other fields like "use", "tags", and "see" should not be present.
I've decided a "plural" or "abbreviation" sense is distinct from a sense being used on its own (so a sense of "Collection of children" is separate from "plural of child"). Yeah, that's kind of weird, especially considering how broadly a lot of other senses get lumped together in this, but really, it's the only sane way to proceed: the point of a dictionary of reserved words is that a non-identical series of letters, even if it "means" the exact same thing, is a different sense. Unless you're Perl or Ruby, close synonyms are going to have no bearing on a close sense of another word. For entries in this dictionary, the spelling of the word itself is half the definition.
So, in short: synonyms are other senses. Period.
Another way of putting it: some senses are more lexically-derived than others. "todos" only makes sense as a plural of a word that itself only makes sense in light of the words "to" and "do". A plural sense asserts "this only exists as a pluralization of another word that probably has a definition in the system".
Also, to elaborate on this being "the only sane way": there's a lot of practical application in the plural senses being separate on a word, and more practical loss than gain from having pluralization be somehow entangled with other senses on a word.
Cut text from README draft:
- PLAN: This should be changed to be another field that can be on cardinal senses, since these can have their own examples
The current working premise is that, if these have their own examples, they should be a standalone sense.
And aren't we keeping tags for plural
and abbreviation
(which if nothing else will make it easier to go back and convert to the other approach if there's a need for it later?)
So, to come back at the other side: situations like Rails and other generators and stuff actually can benefit from plural
relations. So maybe the plural
senses should be included alongside the standalone senses (if the examples don't overlap)?
So, for instance, if a framework implements both "user" and "users" as a plural of "user", we don't include it as an example of the standalone sense of "users" - we just take it as implied from the plural
synonym sense? Or... should it be both?
I think the simplest answer for now is to just do both. The fact that the two senses technically overlap isn't a big deal.
I'm really thinking about splitting word def files into two fields (with array values), def
and derived
. It makes word extensibility easier later, at the cost of making basically every word definition somewhat more verbose (one extra line at the beginning of every file that is almost always the same).
(Bikeshed: derived
should probably mirror def
as deriv
.)
More cut text from the draft README:
- PLAN: These are only mutually exclusive to "tags" and probably to each other. (A plural can be the plural of an abbreviated other word, or an abbreviation can be the abbreviation of a plural word.) They also kind of preempt "use" since the type of synonym basically is the "use" value. So more like they can both have "see" and "commonly" of their own
- ISSUE: So, like, if JS defines Object but not Objects, can I decide if I want only the ones that are defined? Should there maybe be compound-tags like jskeyword?
I'm going to implement the def
and deriv
thing since it makes comprehending the schema a lot easier, and because of the extensibility argument.
Actually, to bikeshed a little more, these are inflections, not derivations. Since they fall under the general umbrella of morphology / combinations of morphemes, I'm going to call the array "morph", or maybe "lex", as it's 3 letters like def
, and is (roughly) about grouping a lexeme by its lemma.
While I'm somewhat breaking my own best practices by doing this, I think it's all so fuzzy that it's worth going by a few fuzzier rules here: what's something that's recognizable, memorable, and distinct from the alternative? In the end, putting "lexically similar" data in an array under lex
makes sense.
As for non-lexical relations (including "slightly different spelling" like nop
and noop
), I think the solution for this is going to be allowing Wiki links in def text.
I don't know about any of that stuff above, but I think "compare" and "contrast" lists make a lot of sense - to keep this from getting out of hand, the "compare" should only be applied when it conveys more information than a common tag between the words being compared (ie. it's sort of like a little ad-hod tag relationship between a small group of words that may not even go both ways).
(note the current plan on tags is #15)
Yeah, like, in the old sloppy-ad-hoc-tags model I was writing, "active" had the tag "focus", and, like... honestly, to highlight the concept of "focus" like that, it'd be easier to browse by thesaurus relations than to try to identify each entangled concept (which itself would have its own word, making the tags superfluous due to imperfect overlaps) - this way, "focus" doesn't have to be king on any hill that could also apply to "active".
From the current README (following the list in #4):