Open asiyani opened 7 years ago
1) I don't think we need to collect more info about the stargazers. We need their username and preferably followers or something like that. Not more than this.
2) If we store the repos as array of the format authorUsername/repoName
, for example if tj
starred curiositylab/curiosity
and addyosmany/xyz
... we can do like this,
usernameSchema = new Schema({
_id:
name: tj
:
repos: ['curiositylab/curiosity', 'addyosmani/xyz', ...]
});
Is that efficient.
3) Local DB is okay for dev stage :smile:
If we store the repos as array of the format authorUsername/repoName, for example if tj starred curiositylab/curiosity and addyosmany/xyz ... we can do like this,
usernameSchema
= new Schema({ _id: name: tj : repos: ['curiositylab/curiosity', 'addyosmani/xyz', ...] });
But then repos array can have thousands of entry for each stargazers(username).
Question: - How are quering GitHub at the moment. I know that we are querying each stargazer but is there any sort or filter while doing API call to Github?
In NoSQL you design database based on Queries you will be doing.
Question: - How are quering GitHub at the moment. I know that we are querying each stargazer but is there any sort or filter while doing API call to Github?
No, there is no real query filter at the moment. In case the user selects a language, the array filter function is used, but it's applied once you query all projects starred by each user.
response.data.filter(filterFunction).slice(0, MAX_PROJECTS_PER_USER).forEach((entry) => {
...
}
When a language is selected, I tried to query just the projects developed with thath language bit It seems like there is no language parameter at Github API.
Ok let just start by writing down queries we think we will be doing to DB.
anything else you guys can think of.....
@alejandronanez What do you think about this?
1 & 2. Have you tried querying the graphql endpoint instead of the rest endpoints? GQL helps us to 'filter' what data we get back from the server.
@alejandronanez good shout about GQL, don't know how to do that. π but it will be fun to learn. π
I think instead of creating an array of repos in usernameSchema
we should add usernames to repos schema....
repositorySchema = new Schema({
name: curiosity
:
language:'javascript',
githubLogins: [asiyani,alejandronanez,mubaris....],
});
In this way we don't have to search username() collection at all, we can just query repository collection. Of course, this will only work if githubLogin are unique and I am sure they are.
# following should give me all repos started by 'asiyani' from DB.
Repository.find({ githubLogins: { "$in" : ["asiyani"]} }, ...);
# following should give me all repos started by 'asiyani' & language=javascript from DB.
Repository.find({ githubLogins: { "$in" : ["asiyani"]} }, language:'javascript');
If user do need info about stargazers then we can query that separtly.
usernameSchema.findOne({login:'asiyani'})
but most of the time we will be quering Repository collection ratherthen username collection. In this way there want be any application level joints.
Let me know what you guys think.
I like this new way of storing repository details. Easy to get details.
@asiyani I like this new approach too. I have experience with GQL, let me know if you hit any roadblock or something. FWIW you don't need any fancy framework to use GQL, so I suggest just to keep it simple at the beginning.
Good then we will go with this scheme. Before we go GQL for our client. we need to work on github API to populate data. I will start with that first so we have some data to send via GQL.
There are few topics we need to discuss regarding the database.
I think
userSchema & repositorySchema is fine but usernameSchema got lots of stuff which we might not need. like location, bio
This one depends on the query we will be running on DB and amount of data. If I am right at the moment we are querying usernames to get repository. in that case......
Problem with this is some username like 'tj' got 1.7k starred repositories! thats to many Ids to put in array. Other solution. because we have limited number of usernames we can do this....
Problem is it will be dificult to just query repository based on usernames.. Don't know π
local DB requires initial setup, mlab needs creating account and MAX limit is 0.5GB (I think this should be more than enough π ). I personally prefer local DB server for developing.
Lets discuss answers for all 3 questions or any other questions related to DB.