postman-open-technologies / gsoc-2023

Postman Open Technologies' repo for Open Source contributions during Google Summer of Code 2023
65 stars 14 forks source link

OpenAPI Web Search #7

Closed jansche closed 6 months ago

jansche commented 1 year ago

Summary: Develop an open-source approach to finding Swagger and OpenAPI definitions on the open web, crawling web pages looking for API definitions, validating them, and them consuming and indexing them as part of an ongoing search. Providing a simple way that developers can find APIs that exist by finding documentation, repositories, and other common aspects of running an API.

Skills: Knowledge of the web, and how to crawl web pages, follow URLs, or utilize an existing solution like Common Crawl.

Expected Outcomes: Provide a simple open-source API that abstracts away the complexity of searching the web for specific terms, helping identify APIs in a sea of web pages. Providing a simple interface that will set in motion an asynchronous searching of the web or corpus of web content looking for APIs. Allowing users to initiate a search, but then return regularly to see the results of the search over time, building up results, but then aggregating them for each pulling via simple API.

Possible mentors: @vinitshahdeo + 1-2 additional mentors

Project Repo: https://github.com/postman-open-technologies/openapi-web-search

Size of Project: 175h

Rating: Medium skills level

jansche commented 1 year ago

Could this be merged with #6 and become a 175h project? @kinlane

BabyElias commented 1 year ago

Hey! So I have been using Postman for quite some time now (always my go-to for visualising API outputs in the best way possible) and I find this idea really exciting. Quick Question: This requires knowledge of web-scraping, right? How can I go about discussing about this task with potential mentors and seek guidance regarding the same?

Prajwalprakash3722 commented 1 year ago

Hello @jansche, This Project actually looks interesting, please correct me If i understood the statement incorrectly;

I am comparing this with API Marketplaces, the proposed solution aims to help developers find APIs that may not be available on existing API marketplaces like RapidAPI by crawling the web and looking for Swagger and OpenAPI definitions, indexing them, and providing access through a simple API interface. This can make it easier for developers to discover and use APIs that are not part of any marketplace and could be relevant for their specific use case.

The idea of crawling the web to find all the Swagger and OpenAPI definitions out there sounds like a Herculean task. Can you tell me more about how can we plan on making it happen? Are we talking about building an army of web crawlers independently or do you have something else in mind?

Nevertheless Exciting stuff!

jansche commented 1 year ago

Hi folks, we're currently coordinating mentors and will provide more details as well as answer questions beginning of next week (week of February 27). Please bear with us. Best regards Jan

Prajwalprakash3722 commented 1 year ago

Cool :)

ankit-pn commented 1 year ago

Greetings everyone! I'm Ankit Kumar, 3rd year CS student at NIT Bhopal. I am excited about this project and find it to be particularly intriguing. Based on the project summary, it seems like the goal is to create a search engine for OpenAPI and Swagger, which will provide reliable and functional APIs. It is important to validate every OpenAPI ans Swagger definition to ensure its reliability and accuracy.

I have experience using Common Crawl Index and have previously worked with OpenAPI on my own side projects and would love to to contribute to this project as a GSOC mentee this summer.

ankit-pn commented 1 year ago

I think merging it with #8 will be a good idea for a 175h project!!

Kd-Here commented 1 year ago

I know how to web crawl and use Common Crawl, Let's us know when mentor are assigned for the task waiting for it.

destrex271 commented 1 year ago

This idea seems great! Can't wait to work on it.

ankit-pn commented 1 year ago

Is any mentor assigned to this project yet @jansche ?

vinitshahdeo commented 1 year ago

Hey @ankit-pn, Glad to see you here. I will be mentoring this project.

Prajwalprakash3722 commented 1 year ago

Hey @vinitshahdeo was my assumption correct?

https://github.com/postman-open-technologies/gsoc-2023/issues/7#issuecomment-1442040170

ankit-pn commented 1 year ago

I am glad to see you as mentor @vinitshahdeo .

I do have some of doubt regrading this project

There are 2 ways to get OpenAPI definitions from Open Web

  1. Crawling the web using through different self-made crawlers (spiders)
  2. Using Common Crawl dataset (Common Crawl update its dataset every month)

For both the approach we are required to define a list of sites [eg. apis.guru, github.com , and other sites where there is possibility of getting OpenAPI definitions].

Although we can use whole CommonCrawl dataset to look for OpenAPI definitions [without defining a list of sites], but this dataset is huge(around 300TB) and scraping OpenAPI definitions from this dataset and storing them for building a search engine will be very much computationally expensive imo.

Is there any workaround for this ?

priyanshu-kun commented 1 year ago

Hey @vinitshahdeo was my assumption correct?

#7 (comment)

I have same question also, please clarify that.

priyanshu-kun commented 1 year ago

Hey devs, My name is Priyanshu Sharma and I've done my bachelor's in computer science. I'm really exited about this project and I found this project a perfect match for my current skills. If I got it right the assignment asked us to find swaggers and open API definitions and list them on a frontend web application. Does that application work like a search engine for swaggers and open API definitions?

Overall, the project is really interesting mentor can count on me.

vinitshahdeo commented 1 year ago

Hello everyone,

Glad to see the engagement here. In a nutshell, the idea is to build a search engine for valid API Definitions. Happy to hear thoughts from you all before we share our roadmap. The concrete roadmap will be shared once we create a dedicated repository for the same.

PS: We love your ideas—let's brainstorm! Keep sharing your approaches along with the pros and cons. Heads up! Please think about API First and consider an end-to-end solution from the backend to the user interface.

priyanshu-kun commented 1 year ago

@vinitshahdeo will you please help me, I feel very conflicted here I mean there are two ways to fetch openAPI definitions, one is web scraping and the second one is a common crawl. where both options have their pros and cons. Web scraping might be a good option if you only need to extract data from a few websites and have the technical know-how to set up and manage a web scraping solution, web scraping gives more control over data. However, Common Crawl might be a superior option if you need to extract data from a lot of websites while avoiding legal pitfalls but it didn't give much control over data.

vishvjeet-thakur commented 1 year ago

Hey @vinitshahdeo , myself vishvjeet , I think using common crawl to get the data from most of the websites would be more efficient as we have to find as many openAPI definitions we can and it will save our time also by utilising the already available dataset.

ankit-pn commented 1 year ago

I think using Common Crawl or using Self made Crawl Bots doesn't makes a lot of difference in complexity of problem that we have to deal. Common crawl itself contain either raw html data or plain text data extracted from those html pages (Using plain text data only makes sense if we have to deal with anything related to NLP) and extracting openapi.yaml/openapi.json files will be easier ( at least for me) from raw html files that extracting it from plain text data.

For me getting raw html data from OpenWeb using self made crawl bots or using Common Crawl , both will be of same complexity but Scraping those html pages for getting openapi definations is real tough deal.

What do you say @vinitshahdeo ? and if there will be any slack or discord channel for further communication on this project, it will be extremely beneficial.

priyanshu-kun commented 1 year ago

@ankit-pn I think it should be clarified soon as we need to design web app system and write a proposal for the same.

destrex271 commented 1 year ago

@jansche are we supposed to use any specific language for this or is it open for us to choose?

MikeRalphson commented 1 year ago

@destrex271 the choice of technology stack will be up to the candidates.

simrann20 commented 1 year ago

Hey!

I am Simrann, a postgraduate student in CS and AI from IIITD. I have been using Postman since a long time and am very keen on contributing to it as a GSoC 2023 student. I am quite intrigued by this idea and have clarity on how to plan this project ahead.

Would love to contribute to this as a GSoC 2023 student under the guidance of @vinitshahdeo

ph1ne4s commented 1 year ago

Hey everyone! I am Aviral Jain, currently pursuing B.tech(2nd year) at IIT Roorkee and working on projects involving MERN stack, python, and c++. I am also interested in cybersecurity and robotics. I have been using postman and would like to contribute to this project under gsoc23.

monstajoe2002 commented 1 year ago

Hello everyone, my name is Youssef Amr. I'm currently pursuing a major in Software Engineering and I love building new applications and working on projects, which I hope to do this year. My experience in programming includes Java, JavaScript, Python, Rust and C++. I also have a YouTube channel where I showcase some programming content as well. My interests include tech and web development related things like frameworks and technologies. I used Postman before and I want to know how to get involved in this GSoC organization possibly with @vinitshahdeo.

monstajoe2002 commented 1 year ago

Can you assign me this issue?

Rishabh42 commented 1 year ago

@KAWALMEET-SINGH would suggest you to be original, plagiarising won't land you anywhere. I know open source can be a bit tempting but please refrain from such malpractices like copying other's code/comments

Hope you understand

monstajoe2002 commented 1 year ago

I didn't plagiarize anyone's code

ankit-pn commented 1 year ago

@vinitshahdeo , since this is going to be a completely new project to implement. It would be great if you specify what will be the mandatory qualification task for this project.

vinitshahdeo commented 1 year ago

👋 Hello everyone!

Glad to see all of you engaging here. We created the repo for this project - postman-open-technologies/openapi-web-search and tried our best to answer all of your questions in the README.md.

Please use this thread for any further doubts.

cc/ @ankit-pn @monstajoe2002 @simrann20 @Rishabh42 @ph1ne4s @destrex271 @priyanshu-kun @Prajwalprakash3722 @money8203 @BabyElias

khsh13 commented 1 year ago

Hello everyone I am khushi sharma, B.Tech 2nd year student at IGDTUW. I am a MERN stack web developer currently working with postman api . I have done web3 projects also. I started my journey into Apis using Postman only and this project really excites me. Moreover, i will definitely learn alot getting into this project and working under the guidance of @vinitshahdeo for gsoc 2023.

MPrashanthR commented 1 year ago

Dear @vinitshahdeo

I am excited to announce that I have completed my training as a Full Stack Developer and I am eager to contribute my skills to open projects. I am particularly interested in the project proposal to develop an open-source approach to finding Swagger and OpenAPI definitions on the open web.

As a Full Stack Developer, I have experience with web development, crawling web pages, and following URLs. I am confident that I have the skills required to create a simple open-source API that will help developers find APIs in a sea of web pages.

I am looking for a mentor @vinitshahdeo @jansche who can guide me through this project and help me learn and grow as a developer. I am committed to putting in the time and effort required to complete this project successfully and contribute to the open source community.

Thank you for considering my application, and I look forward to hearing back from you soon.

vinitshahdeo commented 1 year ago

👋 Hello everyone,

I wrote a public blog post about this project idea - how OAWS can help unleash the power of OpenAPI!

Hope it helps—vinitshahdeo.dev/open-api-web-search

hemanth9398 commented 1 year ago

Dear @vinitshahdeo I am excited to work for the contribution for the open source for postman.I had worked with CiCd pipelines and worked with the postman for making the http requests for the applications and I had knowledge for working with postman. I am looking for a mentor @vinitshahdeo @jansche who can guide me through this project and develop complete knowledge in building of the applications with postman. I am committed to putting in the time and effort required to complete this project successfully and contribute to the open source community.

ankit-pn commented 1 year ago

Greetings everyone! I'm Ankit Kumar, 3rd year CS student at NIT Bhopal. I am excited about this project and find it to be particularly intriguing. Based on the project summary, it seems like the goal is to create a search engine for OpenAPI and Swagger, which will provide reliable and functional APIs. It is important to validate every OpenAPI ans Swagger definition to ensure its reliability and accuracy.

I have experience using Common Crawl Index and have previously worked with OpenAPI on my own side projects and would love to to contribute to this project as a GSOC mentee this summer.

@thelifeofshubh you just copied my whole introduction text. Plagiarism (at least in introducing yourself) will lead you nowhere, so just try to be authentic.

thelifeofshubh commented 1 year ago

Hey there, @ankit-pn ! I was just curious to know if mentor is active lately. Have you noticed any recent activity from them?

Prajwalprakash3722 commented 1 year ago

Hey there, @ankit-pn ! I was just curious to know if mentor is active lately. Have you noticed any recent activity from them?

yes, @MikeRalphson , @vinitshahdeo are pretty active

DevMukhtarr commented 1 year ago

hey @jansche As a backend developer who works with APIs alot, i feel this project will really help developers who deal with APIs be it starting a new project or correct issues in their ongoing project, I have experience in web crawling which is one of the main skills required to make this successful, I'll be glad to work on this project.

LordRona commented 9 months ago

Hello! My name is Fon Ronard Sauh, a third year major in computer science at the University of Buea. I have worked on projects which required API calls and throughout I used postman. I am really enthusiastic about contributing in postman's open source project and this particular project based on my past exposure to open API Web Search. Under the umbrella of GsoC 2024 as a potential contributor, I am certain to add more value to this project and the team.

LordRona commented 8 months ago

Hello @vinitshahdeo please can you recommend me a first issue whilst I am preparing for GSOC 2024. I read the terms of contribution and it listed I get intouch with the main mentor.

benjagm commented 6 months ago

Closed as completed as part of 2023 edition.