🍑 ③ Set up API - Githubissues

good-idea commented 4 years ago

👋Let me know when you're getting started here!

tatianamac commented 4 years ago

As I'm not as familiar with APIs, I'm not sure when it's best to start thinking about the structure of this, but the general idea is that I would like the entire dictionary to have an API, so that different companies and products can tap into the database of words.

Part of the build functionality will include a bot function that can help to auto-correct exclusive terminology.

What other context can I provide that would be helpful?

connor-baer commented 4 years ago

I think the features and functionality that you have in mind are quite clear but I'm unsure about the technical implementation.

The below questions will help us to figure out how to store the data and how to query it.

What kind of data do you have in mind? Is it just word definitions or will there be other types of data?
How is the data structured? Is it a simple one dimensional collection or is there a hierarchy? On the website it looks like some words are grouped into categories. Could there be multiple layers of groups?

tatianamac commented 4 years ago

What kind of data do you have in mind? Is it just word definitions or will there be other types of data?

Only definitions for now, but there will be layers of information (eventually things like parts of speech, connecting like-terms, etc).

How is the data structured? Is it a simple one dimensional collection or is there a hierarchy? On the website it looks like some words are grouped into categories. Could there be multiple layers of groups?

It's definitely layered in that some definitions will require alternate definitions, or sub definitions (as you noticed.

It's probably also important to note the future URL feature I'd like to be able to integrate, which could affect how we structure the data.

coilysiren commented 4 years ago

As hinted at by the above comments - I think this may be easier to parse it you split this task into 3 parts:

setup the data structure
setup the build system for compiling the data structure into html
setup the api for exposing the data structure programmatically

This is primarily relevant here because the api is actually the last step.

good-idea commented 4 years ago

Hi all! My thoughts are pretty in line with what @lynncyrin mentioned above.

setup the data structure

Definitely. In my experience, it has been really helpful to figure out how all of the content fits together as a system, and explore it as much as possible to figure out the edge cases. For instance, I imagine that the structure could be described as (for a start):

Having Words. This could include both inclusive and non-inclusive words, each having an inclusive boolean.
Words have one or more definitions or contexts. For instance, the word crazy on thesaurus.com has three different definitions:

mentally strange, "That person is crazy"
unrealistic, fantastic, "The new star wars trailer is crazy"
infatuated, in love "Right now I'm crazy about sci-fi"

Each of these definitions could have any number of suggested alternative Words (synonyms, basically).

setup the build system for compiling the data structure into html

An alternative here would be limiting the API to providing the data only, and allowing the frontend (website, app, slackbot) take care of rendering the HTML on its own. This could be kind of nice because the API wouldn't have the responsibility for supporting different platforms or environments.

setup the api for exposing the data structure programmatically

Have either of you worked with or have thoughts about GraphQL? I've been using it in my projects for a year or so and have grown to really like it. It comes with some of its own overhead, but a bonus here is that the API (again) doesn't need to grow with more endpoints to satisfy the needs of particular use-cases. I've also found that defining a schema (wether you are going to use GraphQL or not) can be really helpful in the planning phase, just to figure out how all of the data fits together.

Question: other than marking suggested alternative words as inclusive/non-inclusive, any thoughts on how the overall structure might be different from a thesaurus?

Other question: @lynncyrin --- you spoke at !!Con West earlier this year, didn't you?

coilysiren commented 4 years ago

setup the data structure

You're headed down the right path @good-idea! I would encourage splitting the data structure task in half, though:

a. figure out how to encode the data structure into this repo, eg. yaml / json / etc file? database with a CMS? etc. b. determine the structure of the data itself. I get the impression that there's a few dictionary features that are still being specced, so this sub-task is probably best left to @tatianamac for the moment.

setup the api for exposing the data structure programmatically

I would recommend against picking technologies at this point, the encoding of the data structure (eg. task 1a) is a bit more important - and that choice can potentially narrow your api options significantly.

spoke at !!Con West

yep => https://www.youtube.com/watch?v=-fpnf2nOigQ

tatianamac commented 4 years ago

Thank you, everyone! I broke this task up into several issues so we can have more targeted discussions under those relevant issues.

Also, this maybe obvious to everyone but if you haven't already, what I've built so far is here: https://www.selfdefined.app/

good-idea commented 4 years ago

spoke at !!Con West yep => https://www.youtube.com/watch?v=-fpnf2nOigQ

@lynncyrin yay, I thought I recognized you! We didn't meet but I was in the audience.

@tatianamac https://www.selfdefined.app/ is looking great!

denistsoi commented 4 years ago

Just throwing this out there:

Noticed that since the site is being hosted on netlify, the project could leverage netlify functions and query the JSON/markdown files for a basic query.

I know that the aim is to leverage algolia in the future as well, so this could be a step forward towards that.

An alternative is that with now.sh you could use serverless functions to query the API (which would return the same result as above).

Wondering if @good-idea/ @ovlb / @tatianamac have thoughts before I start a PR.

ovlb commented 4 years ago

@denistsoi I have some thoughts, but currently no time to write them out. I will try to do so over the weekend. Also, my thoughts don’t matter too much :)

denistsoi commented 4 years ago

Thanks @ovlb I think one thing to make this work also, is that I may include the raw markdown files within the final build. That way if we need to query the raw definition, it’s already in the deployed site, rather than querying the codebase.

I think this would be helpful say, if the project is moved off from github.

ovlb commented 4 years ago

@denistsoi

Sorry for the delay in answering!

I think we have to answer additional questions before we hop into implementation mode.

Let me expand my reasoning a bit:

1) Are Netlify functions the right tool for the job? I am not sure but also have limited experience. Can you say something about working with them? From my understanding, they are a tool that works with APIs, not to be an API itself. Is this correct? Or am I on the wrong path here? 2) We don’t have a database. Using the file system as a DB works in theory but is expensive and slow. Using e.g. Algolia would allow us to use our existing data and hand it over to them. However, I do agree, it might be overkill for a quick proof of concept. Proof of which concept, though? 3) Not only is it the tools, but also the underlying design: What’s the purpose of the API? What are the use cases and possible implementations? What queries should return which data? How do we protect it against – given the nature of the project quite likely – fraudulent use? Before we start working on a solution, we have to answer these (and probably more) questions. Otherwise, we risk building something too complicated or ill-suited.

I guess the first task where an API is needed is the Twitter bot. As soon as we have this task specced out (which questions does the bot answer, what happens if there is no definition/no alternative words, …) we can build something that solves these problems.

As we all have limited resources, I think it is vital to use our energy wisely and try to focus on implementations that – hopefully – stand the test of time. An API that uses the file system will most likely not do that.

Re: Including raw definitions in the build: Having the definitions in the deployed markup and/or the client-side JS bundle (not entirely sure what you would like to target) seems very detrimental to the performance of the site. I would advise against this.

denistsoi commented 4 years ago

Sorry for not sending this sooner @ovlb, but had drafted an earlier response on my phone and forgot to getting round to sending it.

Are Netlify functions the right tool for the job? I am not sure but also have limited experience. Can you say something about working with them? From my understanding, they are a tool that works with APIs, not to be an API itself. Is this correct? Or am I on the wrong path here?

Netlify functions or any serverless function can act as an endpoint. (Given a API structure, you can pass in your query param to retrieve the data, e.g. /api/definition?)

according to netlify you can add it into the working directory /functions/<whatever-name>.js

exports.handler = async (event, context) => {
    // add logic here
    return {
        statusCode: 200,
        body: "ok"
    }
}

only issue is billing (free-tier is 125k api calls per month, or 100 hours)

We don’t have a database. Using the file system as a DB works in theory but is expensive and slow. Using e.g. Algolia would allow us to use our existing data and hand it over to them. However, I do agree, it might be overkill for a quick proof of concept. Proof of which concept, though? The existing app is a static generated app. (All definitions are defined in markdown and compiled into html).

By hooking into the build pipeline, we could define the serverless functions.

(Re Algolia, I believe we need to provide an index of all the items that want to be searchable -we could feed it with all definitions as JSON).

I don’t think it’s necessarily to require a database unless there’s some data processing later down the line, or want to host the api via heroku or something) (You could theoretically store all definitions on a headless cms and run the build on update)

Not only is it the tools, but also the underlying design: What’s the purpose of the API? What are the use cases and possible implementations? What queries should return which data? How do we protect it against – given the nature of the project quite likely – fraudulent use? Before we start working on a solution, we have to answer these (and probably more) questions. Otherwise, we risk building something too complicated or ill-suited.

The API from previous posts was for users to call the service and service the definition as a response (e.g. /define “word”) rather than have the user copy/paste.

The consumers of the api would be users or bots, but ultimately by sharing a link, the api would return the definitions of words from the defined api.

e.g. user flow:

group are talking in slack
person a mentions gaslighting
person b types /define gaslighting
slackbot returns gaslighting noun: ...

I mentioned including the markup as technically, you could serve the raw text files as a route, and that'd be the most rudimentary proof of concept for an api.

Suppose if the app were to be self hosted, considerations about performance would be higher priority, whereas using statically generated sites like 11ty removes that consideration until the app reaches the free tier limit.

Re: api design/structure or security, I haven't considered them yet, but just wanted to start a discussion.

As we all have limited resources I think it is vital to use our energy wisely and try to focus on implementations that – hopefully – stand the test of time. An API that uses the file system will most likely not do that.

I’m not currently working at the moment so was hoping I thought I’d offer my time to the project rather than doing coding interviews (and face the emotional rejection that comes with that). Also, since i'm in Hong Kong right now, recruitment is slow due to nCoV-2019.

tatianamac commented 4 years ago

(I wanted to jump in and say thank you for this discussion and your time!!! Finding this particular sort of help has been difficult, so your expertise is really valued. I want these discussions to happen concurrently with the the build of the webapp, so we can be mindful of the dictionary's infrastructure for its future plans.)

ovlb commented 4 years ago

Hey @denistsoi,

thanks so much for your answer.

I think I misunderstood one point with including the raw markdown content in the first place, which led me down the wrong path. Would you aim to compile the definitions into JSON objects during build time and store them somewhere in dist/whatever and have some functions that basically require() these definitions to search them?

It feels much more feasible to start building it with this background knowledge. I would say: go for it!

denistsoi commented 4 years ago

Hey @ovlb

Would you aim to compile the definitions into JSON objects during build time and store them somewhere in dist/whatever and have some functions that basically require() these definitions to search them?

I think requiring the definitions as JSON would be the easiest - however, I think there'd be some duplication as we currently have the workflow of

create md file in 11ty/definitions/
eleventy generates collection and converts markdown file to html
html is saved into dist/defintions/<defined-word>/index.html

Having JSON gives us the benefit of searching via key (or regex) and returning the result;

actually i'll add this since i like the added benefit of generating the file as well

denistsoi commented 4 years ago

Update -

I created this structured JSON below

{
  ...
  "women-and-people-of-colour": {
    "metadata": {
      "title": "Self-Defined",
      "url": "https://www.selfdefined.app/",
      "description": "A modern dictionary about us. We define our words, but they don't define us.",
      "author": {
        "name": "Tatiana & the Crew",
        "email": "info@selfdefined.app"
      }
    },
    "title": "women and people of colour",
    "slug": "women-and-people-of-colour",
    "flag": {
      "level": "avoid"
    },
    "defined": true,
    "speech": "noun",
    "alt_words": [
      "people of colour and white women",
      "people of colour",
      "white non-binary people, and white women",
      "find ways to reframe why this dynamic exists",
      "or omit"
    ],
    "page": {
      "date": "2020-02-13T09:56:58.228Z",
      "inputPath": "./11ty/definitions/women-and-people-of-colour.md",
      "fileSlug": "women-and-people-of-colour",
      "filePathStem": "/definitions/women-and-people-of-colour",
      "url": "/definitions/women-and-people-of-colour/",
      "outputPath": "dist/definitions/women-and-people-of-colour/index.html"
    },
    "html": "<hr>\n<p>title: women and people of colour\nslug: women-and-people-of-colour\nflag:\nlevel: avoid\ndefined: true\nspeech: noun\nalt_words:</p>\n<ul>\n<li>people of colour and white women</li>\n<li>people of colour</li>\n<li>white non-binary people, and white women</li>\n<li>find ways to reframe why this dynamic exists</li>\n<li>or omit</li>\n</ul>\n<hr>\n<p>often used as a phrase to encompass “non-white, non-men,” seeking to provide solidarity for these two groups</p>\n<h4>Issues</h4>\n<p>What happens to women of colour? As a woman of colour, I am split between both women and people of colour.</p>\n<h4>Impact</h4>\n<p>As such, it elicits feelings of erasure for women of colour. It also neglects <a href=\"/#non-binary\">non-binary</a> individuals.</p>\n"
  }
}

I got this info via the collection Template object in eleventy and markdown-it, just configuring the netlify function, gonna grab some lunch (brb).

hibaymj commented 4 years ago

In selfdefined/web-app#72 Created an OpenAPI spec for the app based on the high level stuff I saw on the app and in the linked issues from this thread. I don't know much of anything about netlify, so if there's requirements for the data elements to match that format for some reason, the models could be changed in the spec to accommodate.

I think this also addresses selfdefined/web-app#6 and should handle how you can maintain the dictionary when the words get large and manage linking of new words, synonyms, and alternatives. I also leaned towards adding internationalization support via language of origin and translations. Not really sure if that would be helpful at all, but I was just thinking about the context of words.

If you haven't worked with OpenAPI before, you can take that text and put it in http://editor.swagger.io/ to get a good visual UI to see how the things work out. Once code is running and the spec matches, you can actually use that interface to make some calls to the service or output some cURL commands for you.

denistsoi commented 4 years ago

@hibaymj - actually thats a good start and didn't occur to me to have gone down that route - 👍

I suppose we wouldn't need to use netlify functions if we go down this route since the spec could just codegen -> I'm a bit rusty on how that works again so have to look at that part.

The other thing i forgot was where someone could pass in multiple terms and render a page

https://github.com/tatianamac/selfdefined/issues/6#issue-509601267

hmmm... need to think about this some more -

thinking

the project right now is a statically generated site hosted on netlify - my thoughts are: I don't know whether to use swagger-codegen to generate the server stubs into the codebase (or abstract it away into a separate thing)

denistsoi commented 4 years ago

Another thing to mention:

@good-idea mentions in https://github.com/tatianamac/selfdefined/issues/13 about type definitions for graphql. Using that you could also codegen the openapi spec via Sofa: - this publication has a good example: https://medium.com/the-guild/sofa-the-best-way-to-rest-is-graphql-d9da6e8e7693,

Alternatively, we could reverse this process if we wanted a gql endpoint https://github.com/IBM/openapi-to-graphql.

I suppose my bias is to use something like hasura hosted on heroku and store definitions as a headless CMS. (I need to think about this more before going down this path)

denistsoi commented 4 years ago

@tatianamac / @ovlb

I forked the project and hosting it on netlify with netlify functions.

https://elated-lovelace-edac00.netlify.com/.netlify/functions/api?name=women-and-people-of-colour

The definitions are in json located https://github.com/denistsoi/selfdefined/blob/master/functions/data.json

I wanted to get some thoughts before I submit a PR. (wanna get some rest before I do any more improvements)

Thinking how this might look if say someone were to query from say slack or twitter (say get a raw text instead of it being in html)

hibaymj commented 4 years ago

The Open API 3 tools are really robust and getting more capable over time.

Regarding GQL, it's a lot more trouble than you're thinking if you have low API capabilities and your goal is to be cross linked and work with other systems.

More importantly however, if you look closely at the GET operations, you'll see I added support for multiple words to come back. This would mean just tailoring the query parameters would be necessary to support returning more than 1 word in a request.

Writing the API contract first is extremely valuable for the project, but generating it isn't really all that beneficial. The contract also has you define schemas as well. Hope this helps!

denistsoi commented 4 years ago

That’s true;

Suppose I just wanted to couple the api with netlify and the current build process.

Think it’s a good approach if the API is to be abstracted out of the build process.

On Fri, 14 Feb 2020 at 10:51 PM, Michael Hibay notifications@github.com wrote:

The Open API 3 tools https://openapi.tools/ are really robust and getting more capable over time.

Regarding GQL, it's a lot more trouble than you're thinking if you have low API capabilities and your goal is to be cross linked and work with other systems.

More importantly however, if you look closely at the GET operations, you'll see I added support for multiple words to come back. This would mean just tailoring the query parameters would be necessary to support returning more than 1 word in a request.

Writing the API contract first is extremely valuable for the project, but generating it isn't really all that beneficial. The contract also has you define schemas as well. Hope this helps!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tatianamac/selfdefined/issues/2?email_source=notifications&email_token=AAXLQE3NLTNWEBECJANU4ADRC2VXDA5CNFSM4ISUZEE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELZIWXI#issuecomment-586320733, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXLQEYGY5JOIXE42TAY7ZLRC2VXDANCNFSM4ISUZEEQ .

-- .

ovlb commented 4 years ago

FYI: I haven’t forgotten this discussion and providing feedback is on my to-do list. Sorry for the delay.

BrentonPoke commented 4 years ago

It seems as though the api functionality is being worked into the app itself instead of using a database to store definitions, so is that the prevailing design? By decoupling the data from the app, APIs are much easier to build. They're also easier to scale should you decide to offer it as a service to companies down the road. I would like to help in the area of APIs since I have experience there.

leovolving commented 4 years ago

Hi everyone! My name is Leo and I'd love to help with this process if I can.

Having read the entire thread, I feel compelled to echo the concerns of @BrentonPoke. It sounds like the web app is only a small portion of the overall vision for this project. Maintaining a separate API that can be consumed by both our web app as well as 3rd parties would be much better in the long run.

I also have concerns about storing HTML in the data. I think that that's allowing the API to be too opinionated. The frontend should be making decisions on how the data appears visually. If there is a major design change on the frontend, we would have to touch every single entry in the database as opposed to fixing the structure in a single JS file. That's expensive over time.

It seems like a relational database would be the best way to go, given that everything seems to link back to a single source of truth: the root word. AFAIK that should still give us the freedom to use graphQL on the frontend to query the data.

As others have mentioned, once the data structure has been defined, we'll have a better idea of what options we have regarding tech stack. @tatianamac have you finalized, from the user's perspective, what you'd like the API to be able to do? Once that's finalized, we can setup a call to discuss the data structure.

I realize that I'm arriving late to a months-long conversation, so I apologize if I'm stepping on any toes. Is there a point person with whom I should touch base?

tatianamac commented 4 years ago

Thanks for your thoughts @Ljyockey ! Not stepping on anyone's toes—we welcome opinions all around. I'm not sure if you've gotten a chance to see the PR that @hibaymj put together on selfdefined/web-app#72 (it might provide some context). No one has "taken lead on" this API convo/process so if that's something you'd be interested in doing I'd welcome it. We've had a few folks dip in and out.

Generally speaking for the dictionary as a whole @ovlb has been my main partner in crime as he built a lot of the Eleventy customisation which would feed into this API work.

As for API hopes, I'd like for the API to be able to:

generate a word upon prompt (so, for the Twitter/Slack bot, it would be triggered by a phase like @SelfDefinedBot Define polyamory and the bot would respond by asking the API to generate the definition of polyamory
trigger events upon flagged words (e.g., a tool like Grammarly could include the dictionary in their suggestions so that it can say "Are you sure you want to use this term? This term is appropriative of Indigenous cultures. We suggest using X instead)—this functionality could work for Slack as well

I'm new to API stuff so if there's more information that I can articulate I'm happy to do so with some guidance. Thanks again, Leo!

BrentonPoke commented 4 years ago

The first thing you mentioned can be done via a simple query that returns from a database, and the bot will go straight to the API endpoints. The standalone app isn't even touched.
The second is a great idea you could do using webhooks of another service. So if somebody uses a flagged word in slack, the bot could respond to that with alternatives in the database. Other services without webhook endpoints available to the public would need a client-side work-around to flag first and look up alternatives after the flag.

That second one is definitely the most interesting, but will require new code for each service. The good thing is that each bot would use the same API.

One more thing: I did read some of the schemas you guys have been trying to make, and they look great. The one in the pull request seems to have everything necessary in the return types, what the query parameters will be for can be hashed out later.

leovolving commented 4 years ago

I agree that the schema from @hibaymj looks great. I'm becoming more adamant that we should use a relational database now that I have a visual for the data. It looks like the scope of the API will grow over time so it would be nice to stay ahead and avoid the risk of large data objects in the future.

I think Brenton's first bullet point would make for a great MVP. Since the API is going to have so many uses outside of the web app, I think it makes sense to develop the MVP independently. The web app can start consuming the API once it's a little more stable.

Once the API is setup and it's time to start working on these services (the second bullet point), I think we may end up with a lot of devs who want to help out - devs who want to make a service for a platform they're currently using. This is the exciting part, where people will be able to proactively make their online spaces more inclusive! It also underscores the importance of getting this API right, but perhaps I'm getting ahead of myself now.

@BrentonPoke What are your thoughts on a mono-repo vs. moving the API to a separate repo? I know that a mono-repo poses its challenges, but I think it would be great for visibility if we could keep everything together for now.

BrentonPoke commented 4 years ago

Any API should definitely be its own repo. In the future, deploying definition updates shouldn't be coupled with having front-end code running around just sitting there when back-end people just want to work.

As for the type of database, relational is fine if we know what the schema is going to be from the beginning and it won't change. The problem with relational databases is that changing the schema later is time-consuming, and language is very fluid. So a fluid representation of the data would be better insulated from future complications. We can always enforce a data model through a standard for human interpretation, and there are a number NoSQL databases that are very suited for data that's likely to change a lot over time. For simplicity of data representation and relatively trivial querying, there's MongoDB, and rich relationships can come from Neo4J or OrientDB (Neo4J is the current King of Graphs). Non-relational is a property that kind of like the Rook in chess; it doesn't seem useful in the beginning, but becomes one of the most powerful factors on the board as the game goes on.

Another thing to keep in mind is future possibilities with non-english words and loan-words. The very for handling things like pronouns, some languages don't even recognize gender in pronouns, which wouldn't fit with western love language conventions. Gender fluidity is an area where fluid data representation really shines. :stuck_out_tongue_winking_eye:

leovolving commented 4 years ago

Fair point about the repos. It'll make deployment a lot simpler too. We can always link to the API in this repo's README and vice versa. @tatianamac are you good with a separate repo for the API?

Those are all really good points about the database. I think you've won me over. Non-relational is probably the best, especially as this project is likely to be fluid. I'm a little embarrassed to admit that I hadn't even considered other languages, but that in itself is reason enough want flexibility IMO.

BrentonPoke commented 4 years ago

Yeah, cultural convention is something that doesn't become apparent in databases until you start dealing with people from outside your own culture. In the west, things like surname tradition and gender are taken for granted, and leads designers to ignore things like patronimic names, which have a tradition in parts of India.

I think what scares many away from graph databases is the alien query languages that sometimes come with them.

But! Before we create a repo, if we can, we should probably decide what language it will be in, if there's going to be just one API implementation. I personally can do it best in Java, but some like to use nodejs. However, once a data model is agreed upon, we could write multiple API implementations and just use a multi-project structure in one repo (to keep issues for anything dealing with API definitions and implementation attached to one repo).

hibaymj commented 4 years ago

My suggestion is to just edit this API document directly in text and then use it to serialize 1-many server stubs before you pick one which works. In this way you’ll lose the least work, and have the most potential benefits from contract first development.

On Mar 18, 2020, at 5:50 PM, Brenton Poke notifications@github.com wrote:

Yeah, cultural convention is something that doesn't become apparent in databases until you start dealing with people from outside your own culture. In the west, things like surname tradition and gender are taken for granted, and leads designers to ignore things like patronimic names, which have a tradition in parts of India.

I think what scares many away from graph databases is the alien query languages that sometimes come with them.

But! Before we create a repo, if we can, we should probably decide what language it will be in, if there's going to be just one API implementation. I personally can do it best in Java, but some like to use nodejs. However, once a data model is agreed upon, we could write multiple API implementations and just use a multi-project structure in one repo (to keep issues for anything dealing with API definitions and implementation attached to one repo).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

leovolving commented 4 years ago

My comfort zone is JS, but I've always wanted to work with Java more. I've worked with Java a little bit professionally, mostly just maintaining systems that were built by others. There is significantly more beginner-friendly JS OSS than there is Java from what I can see. It would be great if we could provide some inclusivity in the Java space.

I would not be comfortable taking on any technical leadership if we go with Java, but you can sign me up for about ~10 hours per week as an IC. And I'd still be more than happy to help with finalizing the schema as well as high-level technical discussions.

The advantage with JS is that we would likely have more people able to contribute immediately. A brief glance at everyone's GitHub profiles indicates that perhaps @BrentonPoke and @hibaymj are the only two in this thread with significant Java experience. If one or both of you are unsure how long you will be able to contribute to the project, that's also something to consider.

denistsoi commented 4 years ago

Hi all -

I agree with @hibaymj that adding the open api is a good step forward as it allows support for internationalisation/locale support by editing the openapi doc.

API language spec

There isn't any necessary requirement that the API needs to be written in any one language @Ljyockey, as the openapi spec should support any codegen-server stub.

migrating API to another repository

I also don't necessarily agree that the api requires to be out of the repository unless there is alot of community development on that. However, having the api as a subdirectory would allow ease of migration in future if there is sufficient developer resources.

database choice/cost

The one thing that I'd like to specify is that, there hasn't been any lead on where to host the db server (or which provider). If say, the choice of provider goes against the communities code-of-conduct, (i don't think we have one right now, but lets say we go for AWS and they do something that we all don't agree with, migrating the db off creates some issues that we would have to resolve quickly).

I'm not sure if anyone has any preferences on hosting providers (or budget/cost).

Having decided this: - the choice of moving the API to a standalone repo might be reinforced as to not create noise within the frontend app/repo.

Hope this helps. Denis

tatianamac commented 4 years ago

Thanks everyone for this rich convo! I've been following along but am relying on the community's expertise here as APIs are out of my wheelhouse. I also want to provide two points of clarification:

Code of Conduct

The one thing that I'd like to specify is that, there hasn't been any lead on where to host the db server (or which provider). If say, the choice of provider goes against the communities code-of-conduct, (i don't think we have one right now, but lets say we go for AWS and they do something that we all don't agree with, migrating the db off creates some issues that we would have to resolve quickly).

I do want to provide a clarifying point that there is a code of conduct. I couldn't tell when you said "i don't think we have one right now" if "one" was a code of conduct or a provider.

With that said, I'd also add that if a provider did go against our code of conduct that we'd absolutely want to be able to migrate it off of that platform. I'm not sure of the gravity of that need, but it's definitely something important given the sensitivity and ethos behind the project.

Budget I have started an Open Collective for the project, which I'm hoping to add associated costs to. Once we get more contributors, we can bill costs against this (such as that for the API). And, with the API, my hope is that it's always free for not-for-profit but that enterprise entities using it will pay for a subscription. At that point, the API should self-fund. Ofc all speculative, but these are my initial thoughts.

Separate repo I'm open to doing whatever is best maintainable in the long-run. It seems to me that keeping it separate seems best based on what folks have said. If that is the case, it may make sense to look into making this an organisation so that all the repos no longer live under my account and are easily viewed under one main account.

Carry on. ✨

BrentonPoke commented 4 years ago

The OAPI spec is the one thing that should stay in this repo, so people know what the contract is when working on the front-end. The code itself should, at some point, deviate heavily from Swagger codegen stubs for production, since the code Swagger produces has a lot of boilerplate that can be cut out with smarter use of more robust libraries (as an example, the java code Swagger generates is hideous).

Choice of infrastructure providers is definitely something to consider, though we have to be careful to remember that some PaaS vendors actually use the vendors you may be trying to avoid to begin with. For instance, Heroku is backed entirely by AWS, but they have a very attractive ops interface that makes things easy to launch. So this is something we’ll have to think about in direct relation to what database, because some companies make administrating certain databases easier than others.

From: Denis notifications@github.com Sent: Thursday, March 19, 2020 7:37:53 PM To: tatianamac/selfdefined selfdefined@noreply.github.com Cc: Brenton Poke brentonpoke@outlook.com; Mention mention@noreply.github.com Subject: Re: [tatianamac/selfdefined] 🍑 ③ Set up API (#2)

Hi all -

I agree with @hibaymj https://github.com/hibaymj that adding the open api is a good step forward as it allows support for internationalisation/locale support by editing the openapi doc.

API language spec

There isn't any necessary requirement that the API needs to be written in any one language @Ljyockey https://github.com/Ljyockey , as the openapi spec should support any codegen-server stub.

migrating API to another repository

I also don't necessarily agree that the api requires to be out of the repository unless there is alot of community development on that. However, having the api as a subdirectory would allow ease of migration in future if there is sufficient developer resources.

database choice/cost

The one thing that I'd like to specify is that, there hasn't been any lead on where to host the db server (or which provider). If say, the choice of provider goes against the communities code-of-conduct, (i don't think we have one right now, but lets say we go for AWS and they do something that we all don't agree with, migrating the db off creates some issues that we would have to resolve quickly).

I'm not sure if anyone has any preferences on hosting providers (or budget/cost).

Having decided this: - the choice of moving the API to a standalone repo might be reinforced as to not create noise within the frontend app/repo.

Hope this helps. Denis

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tatianamac/selfdefined/issues/2#issuecomment-601464830 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABULIG3WUAMAM5L563PTXU3RIKUFDANCNFSM4ISUZEEQ . https://github.com/notifications/beacon/ABULIG3N5QUKBPBYMI2UT2DRIKUFDA5CNFSM4ISUZEE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEPMZ77Q.gif

hibaymj commented 4 years ago

I’d submit Heroku as a good step for a first place to go, you can “add on” Postgres and other DBs effortlessly and the free tier is surprisingly capable especially for development where you just want to focus on developing the minimum capabilities.

Then you have the usual cloud providers and such like aws/gap/azure but also linode and other smaller services which may be interested in supporting the effort.

On Mar 19, 2020, at 7:49 PM, Tatiana Mac notifications@github.com wrote:

Thanks everyone for this rich convo! I've been following along but am relying on the community's expertise here as APIs are out of my wheelhouse.

The one thing that I'd like to specify is that, there hasn't been any lead on where to host the db server (or which provider). If say, the choice of provider goes against the communities code-of-conduct, (i don't think we have one right now, but lets say we go for AWS and they do something that we all don't agree with, migrating the db off creates some issues that we would have to resolve quickly).

I do want to provide a clarifying point that there is a code of conduct. I couldn't tell when you said "i don't think we have one right now" if "one" was a code of conduct or a provider.

With that said, I'd also add that if a provider did go against our code of conduct that we'd absolutely want to be able to migrate it off of that platform. I'm not sure of the gravity of that need, but it's definitely something important given the sensitivity and ethos behind the project.

Carry on. ✨

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

BrentonPoke commented 4 years ago

I forgot about Linode, but they might be interested in supporting a public-good effort based on the use of their services. Depending on the database chosen, perks could be important. The more capable databases like OrientDB and Neo4J are usually written in Java and require more than the leaner C++ databases like MongoDB and FoundationDB.

denistsoi commented 4 years ago

Thanks @tatianamac for clarifying the code of conduct;

I think for starters we recommend Heroku due to the ease of use (and familiarity within the community of their devtools)

*re migration of providers going against code of conduct

I gave an edge case for migration, but I suppose that another case might be cost, that the predicted costs would be fairly manageable, even if it was solely funded by a single developer (est. 10-15$ per month at the worst case scenario).

A relational db might be good for a base case; but I’d wonder what the db schema might look like? (Open to feedback and input from others on this)

D

On Fri, 20 Mar 2020 at 8:07 AM, Brenton Poke notifications@github.com wrote:

I forgot about Linode, but they might be interested in supporting a public-good effort based on the use of their services. Depending on the database chosen, perks could be important. The more capable databases like OrientDB and Neo4J are usually written in Java and require more than the leaner C++ databases like MongoDB and FoundationDB.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tatianamac/selfdefined/issues/2#issuecomment-601471970, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXLQE4PJYW5IL767Q6CM43RIKXUZANCNFSM4ISUZEEQ .

-- .

leovolving commented 4 years ago

Is everyone good with the current OAPI spec in this repo? Or do we need to revisit it before moving forward?

tatianamac commented 4 years ago

@Ljyockey I'm good with the general structure (from what I understand, as I'm new to API authorship). I am happy to have my contact information listed (currently commented out), which I can provide to you.

I believe that all the types of requests are handled through the API are reflected currently.

Do you need anything else from me to proceed?

simonw commented 4 years ago

My project Datasette could be a really good fit for this.

Datasette gives you a read-only JSON API against a SQLite database. You can host it easily on Heroku, Google Cloud Run or other providers and because the database is a read-only file you don't need to shell out for a hosted MySQL or PostgreSQL database - you can bundle the data with the rest of the code you are hosting as a static asset.

A pattern I have been using a lot recently is to have data that lives in a GitHub repository which is automatically built and deployed as a Datasette API by a CI service. I've used Circle CI for this in the past but more recently I've started experimenting with GitHub Actions, which work really well for this.

I describe my approach in some detail in this article: https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/

A more recent example: a few weeks ago I started using Datasette and a GitHub Action to publish data relevant to COVID-19: https://github.com/simonw/covid-19-datasette - here's an example API: https://covid-19.datasettes.com/covid/ny_times_us_counties.json?_shape=array

A key feature of Datasette APIs is that users can construct SQL queries and use them to get back custom data. Here's an example of a custom query: https://covid-19.datasettes.com/covid?sql=select+date%2C+cases%2C+deaths+from+ny_times_us_counties%0D%0Awhere+county+%3D+%3Acounty+and+state%3D+%3Astate%0D%0Aorder+by+date+desc&county=San+Francisco&state=California

And here's the output of that query as a JSON array: https://covid-19.datasettes.com/covid.json?sql=select+date%2C+cases%2C+deaths+from+ny_times_us_counties%0D%0Awhere+county+%3D+%3Acounty+and+state%3D+%3Astate%0D%0Aorder+by+date+desc&county=San+Francisco&state=California&_shape=array

(It's pretty much using SQL as a 1970s-era alternative to GraphQL, and it works really well).

Another relevant demo: my https://github.com/simonw/museums project takes the YAML data in https://github.com/simonw/museums/blob/master/museums.yaml and turns it into a Datasette API which powers https://www.niche-museums.com/

For selfdefined, the dictionary data is already stored as YAML files in this GitHub repo. To integrate with Datasette, you would need to do two things:

Write a Python script that converts the data into a SQLite database. I have a library called sqlite-utils which helps with this - here's an example of a build script I wrote using it: https://github.com/simonw/covid-19-datasette/blob/master/build_database.py
Add a GitHub Action which runs on every commit and builds and deploys the database. Here's the one I'm using for my COVID-19 project: https://github.com/simonw/covid-19-datasette/blob/master/.github/workflows/scheduled.yml (it publishes to Google Cloud Run, which ends up costing just a few cents a month).

If this sounds like an interesting approach I would be happy to help prototype it.

I should note that this is a very different approach to using something like OpenAPI. I personally find this Datasette approach be an incredibly productive way of working - APIs that used to take me weeks to develop now take me hours - but I completely understand if you want to go with the more widely followed path as to something radically different.

simonw commented 4 years ago

I built a quick prototype here: https://selfdefined-j7hipcg4aq-uc.a.run.app

Here's a demo showing a couple of facets (speech and defined): https://selfdefined-j7hipcg4aq-uc.a.run.app/selfdefined/definitions?_facet=speech&_facet=defined

And here's one of the definitions as JSON: https://selfdefined-j7hipcg4aq-uc.a.run.app/selfdefined/definitions/0de41420cde662b151625da7099f84d74fc43f8a.json?_shape=array&_json=flag&_json=alt_words

I used my (very alpha) markdown-to-sqlite tool to convert the markdown files into a SQLite database:

cd selfdefined/11ty/definitions
markdown-to-sqlite -- *.md selfdefined.db definitions

(That weird -- in there was necessary because some of the filenames begin with a - which confuses the bash *.md otherwise)

If you have Datasette installed, you can then explore the resulting database locally like this:

datasette selfdefined.db

I then published the prototype to Cloud Run by running this command:

datasette publish cloudrun selfdefined.db \
    --service=selfdefined \
    --source=tatianamac/selfdefined \
    --source_url=https://github.com/tatianamac/selfdefined

There's plenty that could be done to improve this - better design for the schema, implementing full-text search, pre-defining useful SQL queries and views etc - but this prototype should give an idea of what's possible.

leovolving commented 4 years ago

Thanks for sharing this prototype, but we need a read/write capabilities for our API and we'd already decided on a non-relational database, as the needs of our schema are likely to evolve overtime.

I should be able to get the API repo started this weekend.

tatianamac commented 4 years ago

@simonw Thank you for putting this together! We'll definitely keep this in mind for the future approaches; for now I think we'll likely proceed as we have set out.

@Ljyockey Thank you! Please let me know if you need anything from me.

leovolving commented 4 years ago

My apologies if anyone was waiting on me to proceed. I'm struggling to keep up with my commitments in the wake of the pandemic. Hoping to get started in the next week or two!

tatianamac commented 4 years ago

hi @Ljyockey ! I recognise things are very weird right now—are you still interested in setting this up? No worries either way, just going through the tickets. ✨🙏🏽

leovolving commented 4 years ago

Hey Tatiana! It’s looking less and less like I’ll be able to dedicate much time to this in the upcoming month after all. I’ll let you know as soon as that changes! 🙏🏽

On Sat, May 2, 2020 at 4:57 PM Tatiana Mac notifications@github.com wrote:

hi @Ljyockey https://github.com/Ljyockey ! I recognise things are very weird right now—are you still interested in setting this up? No worries either way, just going through the tickets. ✨🙏🏽

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tatianamac/selfdefined/issues/2#issuecomment-623031099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLMOS63WDZR4RF37FHITYLRPSXM7ANCNFSM4ISUZEEQ .

selfdefined / api

🍑 ③ Set up API #2