Implement a "file system" harvester

While evaluating the Helmholtz KPIs, I stubled across different "metrics" that ... kind of rely on certain ... "files" (or artifacts) being present (e.g., "Some kind of description is available giving further information on the software in this repository (e.g. readme file).").

This could be evaluated using graph constraints if there was an entry for this in the graph. Hence, my approach would be to create a "file system" harvester that collects (relevant) files from the repository.

Not being a metadata expert, I would use the "hasPart" attribute to store such information, i.e.,

  ...
  "hasPart": [
    {
      "@type": "CreativeWork",
      "name": "README",
      "encoding": {
        "@type": " TextObject",
        "encodingFormat": "text/markdown",
        "url": "file://./README.md"
      }
    },
    ...
  ]

softwarepub / hermes

Implement a "file system" harvester #271