API request improvements

yashsehgal / use-github-react

React hook for fetching activities and information via GitHub API

https://use-github-react.vercel.app

MIT License

1 stars 4 forks source link

API request improvements #5

Open yashsehgal opened 1 month ago

yashsehgal commented 1 month ago

As of the now, the number of API fetches happening inside the hook for all the methods i.e. userInfo, getRepositories and their respective sub-methods - are really high.

I have some ideas-as-solution to this issue, which will help reduce the API calls and improve overall dependency for all methods.

One function to rule them all: Let's create a single function for all our API calls.

const githubRequest = async (endpoint, options = {}) => {
  // Code for all the API calls
};

We're calling the API a lot. Let's cache some responses - Caching can be a good option for solve this issue.

const cachedData = {};

const getCachedOrFetch = async (key, fetchFn) => {
  if (cachedData[key]) return cachedData[key];
  const data = await fetchFn();
  cachedData[key] = data;
  return data;
};

Sometimes GitHub might be slow - We can have timeouts to avoid delayed responses and UI renders.

const timeoutPromise = (ms) => new Promise((_, reject) => {
  setTimeout(() => reject(new Error('Request timed out')), ms);
});

const githubRequest = async (endpoint, options = {}) => {
  return Promise.race([
    actualRequest(endpoint, options),
    timeoutPromise(5000)  // 5 second timeout
  ]);
};

louremipsum commented 1 month ago

Hey Yash, this looks interesting and I wanted to contribute

I had some questions/thoughts about it.

Caching with a single source of truth would be a good strategy to reduce API calls(although we would have to decide on when to revalidate data, or maybe adjust the scope of caching like things which are more likely to change would require short time caching and requests which don't change so often like user profile might be cached for a longer time)
I looked through the repository and was not able to find anything regarding Rate Limiting, which I think would also be an important aspect in making this and also handling rate-limiting errors separately.
Custom pagination support/batch fetching for repo fetching or when fetching large datasets from GH API
Potential use of GraphQL also?

What are your thoughts regarding this?

yashsehgal commented 1 month ago

Hi @louremipsum - Thanks for showing interest in this one. This is one of the really important fixes as too many API calls are happening as of now inside the hook 😅

The questions/thoughts you mentioned are really good. Sharing my responses for the same.

Caching with a single source of truth would be a good strategy to reduce API calls(although we would have to decide on when to revalidate data, or maybe adjust the scope of caching like things which are more likely to change would require short time caching and requests which don't change so often like user profile might be cached for a longer time)

This is a valid point. We should make a list of categorised data items which needs/does not need caching. As caching unnecessary will increase the complexity as well.

I have personally never worked with Caching, so I would you to lead this thread (will guide me as well) and we can continue with the fixes after that.

I looked through the repository and was not able to find anything regarding Rate Limiting, which I think would also be an important aspect in making this and also handling rate-limiting errors separately.

You're right. As of now, there's no rate-limiting applied to any method or API call. However, really important and good to have.

Custom pagination support/batch fetching for repo fetching or when fetching large datasets from GH API

Again, a good point! As of now, there are no such pagination applied to the API calls. Only one method for fetching repos has it i.e. 100 rows per page. Which can be improved and can be made custom, something like,

const {} = useGitHub({ 
  username: "...", 
  personalAccessToken: "..."
});

const allRepos = getRepositories({
  rowsPerPage: N // can be any number
}).all();

Potential use of GraphQL also?

There are some applications of the GraphQL API in the code. For eg. fetching pinned repos. Would you like to describe on this one more?

Great to see the questions coming, really helpful for the coding pattern for the next versions. TYSM! ✨

louremipsum commented 1 month ago

haha thanks for your thoughts My response for each of your response is:

I have personally never worked with Caching, so I would you to lead this thread (will guide me as well) and we can continue with the fixes after that.

I have also worked on very simple caching(caching of Auth0 M2M token to reduce API calls to Auth0 directly) but it was a fairly simple usecase so maybe we can discuss thoughts and figure it out together? Also for the i have some initial idea for the list of what to cache and what not to cache:

What to Cache vs What Not to Cache

Cache (Longer Duration)

User Profiles (userInfo): Cache for 24 hours.
Repository Metadata (getRepositories): Cache for 12–24 hours.
Repository Languages: Cache for 12–24 hours.

Cache (Shorter Duration)

Repository List: Cache for 1–6 hours.
Rate Limit Information: Cache for 1–5 minutes.

Do Not Cache (or Cache for Short Duration)

Repository Commits: Avoid caching or cache for 1–5 minutes.
Issues and Pull Requests: Avoid caching or cache for 5–15 minutes.
Notifications/Events (Stars, Forks, etc.): Avoid caching or cache for 5–10 minutes.

this can be a starting point for caching, should not have to be strictly this but up for further discussion

There is also an ETags which ensures you're not refetching unchanged data(reduced payload), but I don't have much idea about how this works currently and its nuaces so would have to read some more about this.

You're right. As of now, there's no rate-limiting applied to any method or API call. However, really important and good to have.

There is the rate limiting which Github does(primary and secondary) and further rate limiting which can be done using the github hook. Also would have to seriously treat the rate limit by Github since acc. to them they will ban if still req after ratelimit

There are some applications of the GraphQL API in the code. For eg. fetching pinned repos. Would you like to describe on this one more?

Honestly was not sure about this one cause I didn't explore the whole repo as of now but this one of the things which I read before on GH blog that they are useful in combining multiple queries into one for fetching data so maybe(I'm kinda reaching and talking out loud) the hook can offer two options of either serving individual data via REST and if the user knows that they gonna need some set of data everytime then they can use the graphQL one to fetch multiple info in one request.

yashsehgal commented 1 month ago

Update: I have added the recurring meeting details for everyone - For all the upcoming Wednesdays and Fridays during the month of October. Please check the details too in the issue description for,

yashsehgal commented 1 month ago

I see how you're planning to lay the patterns for managing caches for different methods. As you have considerable experience with the same (caching and related things) Can we work on a POC for this one. A POC/DEV version for v1.3 where we can test how things are working in terms of performance and it should also have a good DX.

I would love to assign this one to you, and we can start working on building a POC that will focus on the following points,

Having a better DX for all the methods + sub-methods.
- Have to explore more on how user can enable/disable caching for different functions. Is it going to be a single config that can be set as true at the time of hook initialisation,
```
const gh = useGitHub({ 
username: "...",
personalAccessToken: "...",
enableCache: true
});
```
or, is it going to be config object that can be passed inside respective methods.
```
const gh = useGitHub({ 
username: "...",
personalAccessToken: "...",
});

const allRepos = gh.getRepositories({ enableCache: true });
const user = gh.getUserInfo({ enableCache: false });
const pinnedRepos = gh.getRepositories({ enableCache: true }).pinned();
```
I know the latter one looks too much customizable but can be really helpful. What are your thoughts on this?
Let's also have a decent demo prepared for testing the before/after performance of the hook. That will be a good start to see what things are improving and how much better the performance is getting.
For the POC, we can build a simple profile card component, inside which we can show user basic info, all the pinned repos, language-distribution and a dropdown component with list of languages and toggling it will give us a list of repos with the selected language. That will surely break the API limit 😁

Also, when starting to write the final update for this one, let's plan and break it into sub-features/fixes or PRs for better testing.

louremipsum commented 1 month ago

. Is it going to be a single config that can be set as true at the time of hook initialisation,

Option 1: There can be a default config for caching which can be set by using the {enableCache: true} at the time of hook initialisation for people who don't want to think too much while caching and just want it to work
Option 2: The user can enable caching per method with more parameters like swr: boolean or revalidateAfter: number or maybe a function to manually re-fetch when the user wants and such. (will think about it and inform you here)

Let's also have a decent demo prepared for testing the before/after performance of the hook. That will be a good start to see what things are improving and how much better the performance is getting. For the POC, we can build a simple profile card component, inside which we can show user basic info, all the pinned repos, language-distribution and a dropdown component with list of languages and toggling it will give us a list of repos with the selected language. That will surely break the API limit

Maybe I should make that component before testing and prepare/think about some performance metrics: on top of my head, I can think of number of API calls, time to fetch etc.

Although I'm a bit busy rn now maybe I'll try to make it by Wednesday. Till that, if I find time, i will maybe research more on definite ways to improve performance.

yashsehgal commented 1 month ago

. Is it going to be a single config that can be set as true at the time of hook initialisation,

Option 1: There can be a default config for caching which can be set by using the {enableCache: true} at the time of hook initialisation for people who don't want to think too much while caching and just want it to work

Option 2: The user can enable caching per method with more parameters like swr: boolean or revalidateAfter: number or maybe a function to manually re-fetch when the user wants and such. (will think about it and inform you here)

Both of the options you have mentioned, sounds good as a starting point. This will give devs more flexibility. We can include these patterns in the demo for testing.

Let's also have a decent demo prepared for testing the before/after performance of the hook. That will be a good start to see what things are improving and how much better the performance is getting. For the POC, we can build a simple profile card component, inside which we can show user basic info, all the pinned repos, language-distribution and a dropdown component with list of languages and toggling it will give us a list of repos with the selected language. That will surely break the API limit

Maybe I should make that component before testing and prepare/think about some performance metrics: on top of my head, I can think of number of API calls, time to fetch etc.

Sounds promising, we can surely can continue with these 👍🏽

Although I'm a bit busy rn now maybe I'll try to make it by Wednesday. Till that, if I find time, i will maybe research more on definite ways to improve performance.

Sure, No hurries. Start whenever you're available 😊 We can keep updating this thread as we find and think of more solutions.

Also, adding to the discussion. We can work on the API calls happening inside the hook. As of today, all the API calls (for user info, followers and following list, list of repos, list of pinned repos, etc) are happening at the time of hook initialisation. And this is surely not a good way and consumes a lot of tokens.

As a solution, we can create getters for all the data retrieval functions. So that APIs are called only when there respective method is invoked. Can think more about how the coding pattern will look like. Thanks!

yashsehgal / use-github-react

API request improvements #5

What to Cache vs What Not to Cache

Cache (Longer Duration)

Cache (Shorter Duration)

Do Not Cache (or Cache for Short Duration)

8