The goal of this project is to provide a robust yet easy way to search Github for OpenAPI and Swagger definitions. Understanding that there is a lot of noise available, that we only care about OpenAPIs that validate, and that the Github API has rate limits that require you to automate the crawling over time. Providing a robust open-source solution that will crawl public Github repositories for machine-readable API definitions. The project will consist of developing an open-source API that allows you to pass in search parameters and then utilize the GitHub API to perform the search, helping simplify the search interface, and handle conducting a search in an asynchronous way, allowing the user to make a call to initiate, but then separate calls to receive results over time as results come in, helping show outcomes over time.
Dependancies: NodeJS 18, npm, Github APIKey How to get a Github API Key: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
.env
file in the directory and add the variables:
PORT= (port number you want to host the api)
GITHUB_API_KEY= (github API key)
ES_HOST= (determines location of elasticsearch db) docker compose up
python scripts/seed_script.py
from the root of the folder. (Takes around 2-3hrs) Check out the loading details below, to learn more about the loading script.
npm i
.env
file in the directory and add the variables:
PORT= (port number you want to host the api)
GITHUB_API_KEY= (github API key)
ES_HOST= (determines location of elasticsearch db)
npm run build:watch
on one terminal.npm run start
to start the server on the port specified on. localhost:{{PORT}}
and then you will be able to see the admin panel through which you can inference with some of the API'spython scripts/seed_script.py
from the root of the folder. (Takes around 2-3hrs)1. docker pull docker.elastic.co/elasticsearch/elasticsearch:8.8.2
2. docker network create elastic
3. docker run \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.8.2
Currently, we are only indexing OpenAPI Files from the top 2000 most popular organisations from Github (Based on stars). Although more organisations can be indexed by adding them to the scripts/assets/organisations.txt
file.
The organisation.txt file has the top 1000 organisations by stars.
The organisations2.txt file has the next 1000 organisations by stars.
You can change the org list in the seed_script.py to load more data.
More can be added using the gitstar_ranking script.
🚧Under Construction