jkomoros / card-web

The web app behind thecompendium.cards
Apache License 2.0
56 stars 8 forks source link

A `synonym` reference type #401

Open jkomoros opened 3 years ago

jkomoros commented 3 years ago

That points from a concept to another, and denotes that the entitled concepts are synonyms for one another.

The from and the to must both be concept types.

The relation should be symmetrical. How to enforce that?

Blocks on #399

When such a relation is set, the index terms for cards that point to a concept should "smear" to include the other concepts in the synonym group, once per time the primary concept shows up. (The smeared concepts should be ranked a bit lower)

jkomoros commented 3 years ago
jkomoros commented 3 years ago

A few things to think about with synonyms.

In some cases, it's a bidirectional synonym. In some cases it's a one directional (e.g. an example-of) relation. Some relations are weaker than others.

Should the ngrams be copy/pasted everywhere to transitive relation cards, or should each ngram in the synonym class be treated as expanding to the same ngram that represents the class?

There are some cases where you want to mark a synonym of a card but don't want to have they synonym be its own card. Maybe have a card.title_alternates that are back ported?

jkomoros commented 3 years ago

Sometimes you want there to be a key card for the synonym group that they all are based off of.

jkomoros commented 3 years ago

We could make it so every snonym group reduces down to a given particular word, and any time any of the alts are encountered they all reduce down to that word. But sometimes you do want the original wording. Maybe an extra normalized value in card.nlp that's synonyms removed?

jkomoros commented 3 years ago

The synonyms should start out just affecting wordCountsForCard, pretending that the synonym expansion words are there.

Each card can get a title_alternates strings. And for now just have it be represented in UI as a textarea where each line is an alternate title. (Joined and then split to consider if they're changed)

Then we calculate a synonymMap from concept cards that is a map of word => array of synonym expansions. By default that map would be just a join of title_alternates and any backported card titles it pointed to via synonym references. Then, it does that expansion a few times until it settles, so you get synonym expansions from tarnsitive cards. The synonym map would be passed around throughout the pipeline in the same way that importantNgrams is.

Later, there's a text field for indexing that takes any words in other fields and expands their synonyms using the synonym map, so at least they'll match a little bit if you search for that word, although they won't match the

And then maybe later we figure how to reverse a given word to its normalized version, and then have an extra nlp property on card of deSynonmed . That map would have to make sure that each one reduced to precisely one. And then queries would be normalized that way and queries would look over that text property for matches. (Although ideally with a boost for cards that actually match)