nodenv / formulae

🏎 An online formulae browser for Homebrew
https://nodenv.github.io/formulae/
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Leverage Rake's strengths for generating analytics data #16

Closed jasonkarns closed 4 years ago

jasonkarns commented 4 years ago

Rake's power is inherited from make's power, which is the dependency awareness between files (targets and sources) such that only the minimal set of files/tasks need to be run.

This extracts the analytics fetching logic into rakelib/analytics.rake where the files themselves are declared using Rake's DSL.

This has an added benefit that the analytics fetching can be parallelized by instructing rake to run in multitask mode (-m). This accounts for an 80% reduction in build time locally.

the data:analytics:mac and data:analytics:linux tasks depend only on the file paths for their respective directories (_data/analytics/*.json and _data/analytics-linux/*.json). The tasks for those file patterns are defined using a rule:

rule FILE_PATTERN => ["%{^_data,#{API}}p", "%d"]

This rule declares that the source files (prereqs) for the json data files are:

  1. the file path with _data substituted for the http-api prefix
  2. the directory portion of the file path.

The parent directories are defined using a loop: DIRS.each { |d| directory d }

The http-api "sources" are defined using their own rule: rule Regexp.new(API). We ensure that the http-api tasks are instances of HttpResourceTask, which is a subclass of Rake::FileTask that provides its timestamp as the HTTP Last-Modified header; and a convenience method for parsing the json body.

Lastly, the json file tasks themselves are created as instances of Analytics::JsonFileTask, which itself is a subclass of Rake::FileTask. It is customized to define their file timestamps from the json data end_date (as opposed to filesystem mtime).