cpanm --installdeps .
To collect some data from MetaCPAN and GitHub run:
perl -I lib bin/cpan-digger --recent 5
Run this to collect the data and generate the pages just as in GitHub Action does:
./generate.sh
or look in the file and run part of the commands.
In order to view this, install
cpanm App::HTTPThis
and then run
http_this _site/
See the command here: https://github.com/szabgab/perlweekly/blob/master/bin/metacpan.pl
One-time:
Cron job:
Download the most recently uploaded "latest" releases from MetaCPAN and save them in JSON files.
Download coverage data into separar JSON files.
Go over the JSON files of the releases, try to clone the git repository if it don't have it yet. git pull if we already have it.
Go over the JSON files of the releases and fetch data from the GitHub API. (e.g. information about issues, Pull-Requests etc.)
Go over all the cloned repositories and analyze them.
Generate the web site from all the JSON files.
Some of the data can change on MetaCPAN even without a new release, for example the data that comes from cpantesters and cpancover.
We need to be able to update the data in the json files.
distribution JSON files should be lower case as well. and they should be in metacpan/distribution/HH/distribution-name.json
author JSON files should go into metacpan/authors/AA/author.json
When cloning repo lowercase the URL before cloning so we will only have lower-case addresses and folders. We should not be impacted by a change in case.
Collect data from the most recent commit on GitHub. (e.g. does the project have Makefile.PL or dist.ini or both), run Perl::Critic on the code. This information can be update if there is a commit on the default branch of the project. Even without a new release to CPAN. (--vcs flag)
Collect non-git data from GitHub and analyze it. Some, such as the open issue and PR count is already supplied by MetaCPAN, but if we would like to analyze the closed issues and PRs as well we will need the GitHub API. (Later, if at all)
Collect data from the commit history of the project. (Later, if at all)
After every run on GitHub Action zip up the data files and upload them to S3 and restore this the next run so we can collect all the data.
Generate static HTML pages and a simple weekly rerport to include on the Perl Weekly.
Fetch the list of recently uploaded released
Check if there is a link to VCS
Check if there is a link to bug tracking system (Q: what if the VCS is GitHub but the bug tracking is RT?)
If the source code is on GitHub check if some any of the CI system is configured.
If only .travis.yml exists then report that too as Tracis stopped offering its service.
Check for license meta field
Does the documentation contain link to http://search.cpan.org/ ? That is probably some old boyler-plate code and should be either removed or changed
Is there a link to https://cpanratings.perl.org/ ? I think that site is not maintained any more so probably that link should be removed as well.
http://www.annocpan.org/ is now something else, that link should be removed for sure
Is there a link in the docs to http://rt.cpan.org/ while the module actually uses some other bug tracker?
The following command will build the Docker image and run the data collection process inside a Docker container
./docker.sh