Open johncmckim opened 8 years ago
I'm really interested in seeing us extract more data from the projects to surface to people visiting the site. Nudging successful projects up the search results would be a great way to motivate maintainers. A simple example of this is #233.
As discussed in #261 it's rather easy to get rate-limited currently, so perhaps we need to stand up a service that users hit (rather than hitting the API directly), and then start exploring data somehow.
I wish I had the bandwidth to look at this, but I'm open to helping someone else if they want to get down and dirty with fleshing this out.
I did notice the rate limiting. Both issues require some kind over back end service.
The comment about a Azure web job to rebuild pages is interesting. However, I personally think it would be wasteful to rebuild all the static content, to update a small portion of the content that actually is dynamic.
One solution could be to use AWS Lambda and AWS S3. Specifically, you could potentially use a Scheduled Lambda Function to make your API requests and upload the results to S3. The Jekyll site could then hit S3 instead of Github directly. This should be a very cost effective solution. If there's an Azure equivalent that could be good as well.
Another potentially free solution could be to use AppVeyor Scheduled builds to make the API calls and generate a JSON artifact which is uploaded somewhere (S3, Azure Blob, other?) to be pulled by in by the Jekyll site.
Other options would be to setup a service dedicated to doing this. However, I think it could potentially be solved without a server. Do you have any preferences for technologies or other thoughts about how this could be solved?
Depending on how you want to solve this, I can potentially help out. Though I like you, time can be somewhat hard to come by.
It just so happens that I know a static site generator that is designed specifically to handle complex code-based scenarios like this. I also know the maintainer of said generator is looking for some community projects to help out and apply it to.
So I guess the first question is: how wedded are you guys to Jekyll?
The other question I have is what specifically we would do with the additional data? Sorting by popularity is one good idea, but how to measure "popularity"? Number of stars, PRs in the last week or month? I've also seen other similar sites present number of issues, forks, PRs, etc. Would there be interest in each listed up for grabs project having a detail page with more information, or maybe a "fly-out" kind of like ProductHunt?
@daveaglick
So I guess the first question is: how wedded are you guys to Jekyll?
Speaking just for myself, I really enjoy the benefits that Jekyll gives us right now. Not to rule it out too early, but the current stack works really nicely for what we need. Before I "throw it all away" (my words, tongue in cheek) I'd really like to understand more about this alternative and what it gives us.
The other question I have is what specifically we would do with the additional data?
For me, the big benefit of this data would be to identify and promote projects which stand out and achieve the goals we set out for this project - of course how we measure those things is something we can discuss. Ultimately I'd love to surface those results on the site so people can discover these active, successful projects more easily.
For me, there's two different sorts of data here - the projects (which are relatively stable - I don't recall removing a project from this list) and then the data relating to each project (which is as dynamic as we want it to be). And with the crazy things you can do in a browser right now, I'm still leaning towards keeping these two separate, rather than regenerating the entire site as the data changes.
I really enjoy the benefits that Jekyll gives us right now
Totally understandable. Maybe I'll put together a little PoC so you can see what I'm thinking. No pressure, and it'll be a good excersize in any case.
Out of more broad curiosity, what benefits are most important to you from Jekyll right now? The rapid rebuild on changes, use of front matter, templating language, etc.? Or is it mainly that it's already there and works well ("don't fix what ain't broke")?
Maybe I'll put together a little PoC so you can see what I'm thinking.
That'd be great.
Out of more broad curiosity, what benefits are most important to you from Jekyll right now?
For me, it's more about the GitHub Pages support (that is, a superset of Jekyll):
Since this went a little quiet, I thought I'd make a little demo. I created a separate repo as it's just an proof of concept.
This node script iterates the project YAML files, requests the issues and then outputs the counts to a json file. This could be done as part of a build process. The outputted JSON can then be uploaded to an appropriate place and the jekyll site can hit that instead of the API directly.
This is a cut down and simplified version of the code (see the link above to test it):
// ... require 'fs', 'path', 'lodash', 'promise', 'yamljs', 'octonode'
// ... create path variables
// parse configs
var projectConfigs = _.map(projectFiles, function(fileName) {
// ...
return YAML.parse(fileContent);
});
var client = github.client();
var linkRegex = /github.com\/([^\/]+\/[^\/]+)\/labels\/([^\/]+)$/;
var issuePromises = [];
// load configs from yaml
_.each(projectConfigs, function(config) {
var repoUrl = config.upforgrabs.link;
var gh = repoUrl.match(linkRegex);
if (!gh) {
return;
}
var repoName = gh[1], label = gh[2], ghrepo = client.repo(repoName);
issuePromises.push(new Promise(function (resolve, reject) {
ghrepo.issues({ labels: label }, function(err, data, headers) {
resolve(/* ... result ... */);
});
}));
});
// wait until all issues resolved
Promise
.all(issuePromises)
.then(function (issues) {
// reduce results to appropriate format
var issueCounts = _.reduce(issues, function(result, item, key) {
var hasError = !!item.response.err;
result[item.repo.name] = {
hasError: hasError,
error: hasError ? item.response.err.message : null,
count: hasError ? null : item.response.data.length,
};
return result;
}, {});
// write results to disk
fs.writeFile(outputFilename, JSON.stringify(issueCounts, null, 2));
});
This is just getting the issue counts. It could potentially resolve #261 as this would only need to be run at limited intervals. It could then be expanded to start retrieving and processing other data to produce statistics instead.
If this is a solution that interests you I can help set it up. The main questions to take this from a concept to solution are:
I see a few options for it:
This could become part of the website build process. The scripts are part of this repo and executed in the travis build. The resulting json would then just become part of the Github pages site. However, if it has to run regularly on a schedule to update the data, the whole site is being rebuilt constantly. Furthermore, Travis doesn't support scheduled builds so you would need to use something to trigger the builds on a schedule (https://nightli.es/ or similar).
Otherwise, it could become a separate service. The scripts are in a new repo and executed by some build process (Possibly Travis, maybe AppVeyor as they support Node and scheduled builds). The output is then stored somewhere (S3, Azure Blob, something else) and the website uses that as the endpoint instead.
What do you think @shiftkey?
@johncmckim that's interesting, but I really want to avoid the whole "build step" option. So I'll put my money where my mouth is and publish a little demo repo here that I knocked together this afternoon which shows what I was thinking:
https://github.com/shiftkey/up-for-grabs-api-demo
The live site is available here: https://up-for-grabs-data.herokuapp.com/issues/count?project=albacore (the project name is case-sensitive, and isn't the filename of the YAML file).
I went with the really lazy approach here:
I went with a simple, dumb endpoint to verify the caching is working as expected, but this could easily be used as a proxy for the browser making requests directly to GitHub - and we can shape the API however we want, and leave caching up to however we configure memcached.
Apologies that this is radically different to what you had in mind, but hopefully this approach interests you enough to help collaborate further on it!
@shiftkey I was taking a build step approach as I thought the aim was to avoid a web server. I like this approach too, really simple.
I haven't used Heroku myself, but if it's just writing node app I can do that. Happy to contribute. If you create issues on https://github.com/shiftkey/up-for-grabs-api-demo and mark some as up-for-grabs, I'll take a look at the ones I think I can help with. I could also create some issues for statistics related endpoints so the api side of this issue can be tracked there.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Firstly, I really like the up-for-grabs concept. It would be interesting to see the affect of the up-for-grabs tags or report on engagement in general.
Using Github APIs specifically list issues, list collaborators and statistics api data could be pulled on issues with the up-for-grabs tag and who contributed to those issues. This could give some interesting information about the performance of the tag and community engagement in general.
In terms of code contributions, information that could be captured could include:
Other information about issues created and closed could also be interesting. However, that could be considered later.
I think this could provide valuable information on a Repo's community engagement, which seems to be what you are trying to encourage with this tag. The information could be fed back to Repo owners and possibly made publicly available.
I would love to hear what you think of this. I understand it may not fit into this Repo, a jeklly site probably can't do this. However, I wanted to float the idea anyway to see what you think.