① Determine data structure itself

tatianamac commented 4 years ago

determine the structure of the data itself. I get the impression that there's a few dictionary features that are still being specced, so this sub-task is probably best left to @tatianamac for the moment.

Currently, each word has:

Flag: Use/Avoid
Label: Ableist/Slur
Definition
Part of speech
Benefits/issues (depending on use/avoid)
Impact
Alt words

Future features will include:

Literature
Attribution/authorship
Nuance or alternate opinions (aside: One of the challenges of this project will be that identity is nuanced; two nonbinary people will feel differently about the same definition, so we'll need to find tactful and considerate ways to approach this through the structure of the definitions.
Link string (see URL wishes here.

Possible features could include:

Upvoting/downvoting? I am not sure how helpful this really is, but something I've thought about.

tatianamac commented 4 years ago

@good-idea Here is the structure I'm thinking. Per your question, I think that infrastructurally, it's probably not much different than a standard online dictionary. However, as I have future hopes for this to integrate into Twitter/Slack bots and the API, I'm not sure how that impacts how we conceive each word's individual structure accordingly.

good-idea commented 4 years ago

This all looks good and makes a lot of sense. Some more Q's to discuss:

Should words support multiple definitions/contexts? (i.e. my 'crazy' example in the other thread - each definition would imply a different set of alternatives)
If so, maybe the definitions should be should include the flag, part of speech, benefits, issues, and impacts?

There might be some terms that should be avoided in some circumstances but can be used in a non-harmful way in others. Perhaps:

"That person is homeless" - harmful/avoid
"Los Angeles' homeless population is rising" - not harmful*
- *I'm not sure how I actually feel about saying this is "not harmful", but, just bringing it up as an example

I think breaking the word down into its various definitions could give us more flexibility when it comes to dealing with nuances.

good-idea commented 4 years ago

@lynncyrin following up from the other issue:

would recommend against picking technologies at this point

Totally agree. I mention GraphQL because it pretty much takes defining a data structure as its starting point, and makes all of the data types and relationships explicit. Even if we don't end up using GraphQL, putting all of our decisions into a schema definition will make sure we're all on the same page and the structure is clear and makes sense structurally.

Based on Tatiana's list above + breaking words down into multiple definitions, a starting point could be:

type Word {
   word: String!
   definitions: [Definition!]!
   linkString: String!
}

enum PartOfSpeech {
  NOUN
  ADJECTIVE
  ..etc
}

type Definition {
  benefits: [String]
  issues: [String]
  impact: [String]
  partOfSpeech: PartOfSpeech!
  # Flag to 'avoid'. Or, make this an array of flags that are more like tags
  flag: Boolean
  labels: [Label]!
  alternatives: [Word]!
}

# Labels for definitions
# i.e. "ableist" / "slur", etc, could include 'positive' labels too, i.e. "inclusive", "gender-neutral"
type Label {
  name: String
  description: String
}

good-idea commented 4 years ago

Future features will include:

Literature Attribution/authorship

These could be pretty easily added now, I think.

Upvoting/downvoting? I am not sure how helpful this really is, but something I've thought about.

I think this is a crucial issue, but might make sense to put in a milestone further down the road. We won't be able to handle this if we're just storing the library as JSON. A JSON library makes a lot of sense for getting off the ground, but it means that only developers can contribute - eventually, anyone should be able to have their say by clicking a button (or something)

good-idea commented 4 years ago

Here's a quick example/exploration of how to structure it with JSON files:

https://repl.it/@good_idea/WhimsicalGreenyellowDemos

The challenge is going to be finding the right spot between complexity and simplicity. Language is so fluid that it's hard to really define it in any rigid structure - but since we're doing this in code, there needs to be one. A simpler structure will make it a simpler tool to use, but not account for edge cases. For example:

Simple option: mark Words as "avoid"
- Pro: Covers most use cases, Easy to parse a sentence and find any words that are flagged as "avoid"
- Pro: makes definitions easier to add to the dictionary
- Con: There will be instances where a word is OK to use in one context, but not another. The word savage is a definite "avoid" when referring to a person, but would be acceptable when talking about feral animals or brutality in general, i.e. "a savage criticism"
Complex option: mark Definitions (or contexts) as "avoid"
- Pro: able to more discretely determine when to avoid a word, based on the context.
- Con: Now it's more complicated when using the API or dictionary module. How does the code look at a tweet or slack message and pick out words to avoid? All of the sudden it sounds like it needs some kind of language processing/context recognition.
- Con: adding definitions to the dictionary is now more complex

🤔

In terms of the overall project, it might be better to take a simpler approach at first. If the "flagship" part of the project is the /me/my+terms pages, then this reduces the complexity needed. The flag would indicate "avoid using this term to define a person or thing" instead of just "avoid using this term (but maybe not in different circumstances)".

But, this would mean that a slackbot (or other scripts/plugins/bots) would be more limited.

mjoynes-wombat-web commented 3 years ago

I've started mapping out the data structure for if we were to use MySQL as the database for my example of using MySQL, Elastic Search and a serverless function for the API.

https://github.com/ssmith-wombatweb/api/blob/local-test/elasticsearch-mysql/test-api-environment/elasticsearch-mysql/MySQL%20Data%20Structure.md

selfdefined / api

① Determine data structure itself #5