mubaris / curiosity

Find Amazing Github :octocat: Projects :zap:
https://mubaris.github.io/curiosity/
247 stars 66 forks source link

db Modeling and db related discussion. #42

Open asiyani opened 7 years ago

asiyani commented 7 years ago

There are few topics we need to discuss regarding the database.

// Schema for logged in users.
userSchema = new Schema({
   _id: mongoose.Schema.Types.ObjectId,
    githubId: Number,
    login: String,
    name: String,
    html_url: String,
    accessToken: String,
});
// Schema of usernames of stargazers in Github
usernameSchema = new Schema({
    _id: mongoose.Schema.Types.ObjectId,
    githubId: Number,
    login: String,
    name: String,
    html_url: String,
    location: String,
    bio: String,
    public_repos: Number,
    public_gists: Number,
    followers: Number,
    dbLastUpdated: Date,
    starredIds: [mongoose.Schema.Types.ObjectId],
});
repositorySchema = new Schema({
    _id: mongoose.Schema.Types.ObjectId,
    name: String,
    html_url: String,
    description: String,
    stargazers_count: Number,
    forks_count: Number,
    created_at: Date,
    updated_at: Date,
    language: String,
});
****Questions****
1. is properties of each schema is enough or do we need to store more data related to each collection.
2. relationship between username and repository Schema? Embedded, one-to-N or N-to-N?
3. DB server for development -  local mongo OR mlab?

I think

  1. userSchema & repositorySchema is fine but usernameSchema got lots of stuff which we might not need. like location, bio

  2. This one depends on the query we will be running on DB and amount of data. If I am right at the moment we are querying usernames to get repository. in that case......

    usernameSchema = new Schema({
    _id: 
    name: 
    :
    repositoryIDs: [ObjectId1,ObjectId2,......N],
    });

    Problem with this is some username like 'tj' got 1.7k starred repositories! thats to many Ids to put in array. Other solution. because we have limited number of usernames we can do this....

    repositorySchema = new Schema({
    _id: 
    name: 
    :
    usernameIDs: [ObjectId1,ObjectId2,......N],
    });

    Problem is it will be dificult to just query repository based on usernames.. Don't know πŸ˜–

  3. local DB requires initial setup, mlab needs creating account and MAX limit is 0.5GB (I think this should be more than enough πŸ˜‰ ). I personally prefer local DB server for developing.

Lets discuss answers for all 3 questions or any other questions related to DB.

mubaris commented 7 years ago

1) I don't think we need to collect more info about the stargazers. We need their username and preferably followers or something like that. Not more than this.

2) If we store the repos as array of the format authorUsername/repoName, for example if tj starred curiositylab/curiosity and addyosmany/xyz ... we can do like this,

usernameSchema = new Schema({
    _id: 
    name: tj
    :
    repos: ['curiositylab/curiosity', 'addyosmani/xyz', ...]
});

Is that efficient.

3) Local DB is okay for dev stage :smile:

asiyani commented 7 years ago

If we store the repos as array of the format authorUsername/repoName, for example if tj starred curiositylab/curiosity and addyosmany/xyz ... we can do like this,


usernameSchema

= new Schema({ _id: name: tj : repos: ['curiositylab/curiosity', 'addyosmani/xyz', ...] });


But then repos array can have thousands of entry for each stargazers(username).

Question: - How are quering GitHub at the moment. I know that we are querying each stargazer but is there any sort or filter while doing API call to Github?

In NoSQL you design database based on Queries you will be doing. 
raulvillares commented 7 years ago

Question: - How are quering GitHub at the moment. I know that we are querying each stargazer but is there any sort or filter while doing API call to Github?

No, there is no real query filter at the moment. In case the user selects a language, the array filter function is used, but it's applied once you query all projects starred by each user.

response.data.filter(filterFunction).slice(0, MAX_PROJECTS_PER_USER).forEach((entry) => {
...
}

When a language is selected, I tried to query just the projects developed with thath language bit It seems like there is no language parameter at Github API.

asiyani commented 7 years ago

Ok let just start by writing down queries we think we will be doing to DB.

  1. get all repo of say.... 'tj'
  2. get all repo written in 'Javascript'
  3. get repo which is starred by all/most stargazer.
  4. get repo which is updated in last 24/48h. (there repo means starred repos)

anything else you guys can think of.....

mubaris commented 7 years ago

@alejandronanez What do you think about this?

alejandronanez commented 7 years ago

1 & 2. Have you tried querying the graphql endpoint instead of the rest endpoints? GQL helps us to 'filter' what data we get back from the server.

  1. Local db is fine! 😊
asiyani commented 7 years ago

@alejandronanez good shout about GQL, don't know how to do that. πŸ˜‰ but it will be fun to learn. πŸ˜„

I think instead of creating an array of repos in usernameSchema we should add usernames to repos schema....

repositorySchema = new Schema({
    name: curiosity
    :
    language:'javascript',
    githubLogins: [asiyani,alejandronanez,mubaris....],
});

In this way we don't have to search username() collection at all, we can just query repository collection. Of course, this will only work if githubLogin are unique and I am sure they are.

# following should give me all repos started by 'asiyani' from DB.
Repository.find({ githubLogins: { "$in" : ["asiyani"]} }, ...);

# following should give me all repos started by 'asiyani' & language=javascript from DB. 
Repository.find({ githubLogins: { "$in" : ["asiyani"]} }, language:'javascript');

If user do need info about stargazers then we can query that separtly.

usernameSchema.findOne({login:'asiyani'})

but most of the time we will be quering Repository collection ratherthen username collection. In this way there want be any application level joints.

  1. I am thinking of changing the name of the collection from username to stargazers. Just to avoid confusion between userSchema ( our site user) and usernameSchema (stargazers).

Let me know what you guys think.

mubaris commented 7 years ago

I like this new way of storing repository details. Easy to get details.

alejandronanez commented 7 years ago

@asiyani I like this new approach too. I have experience with GQL, let me know if you hit any roadblock or something. FWIW you don't need any fancy framework to use GQL, so I suggest just to keep it simple at the beginning.

asiyani commented 7 years ago

Good then we will go with this scheme. Before we go GQL for our client. we need to work on github API to populate data. I will start with that first so we have some data to send via GQL.