iop-alliance / okh-search

A self-hostable, federated search for open source hardware
https://search.openknowhow.org
MIT License
10 stars 9 forks source link

Links to invalid YAML file(s) preventing preview/build #43

Closed psprings closed 3 years ago

psprings commented 3 years ago

Problem

https://github.com/OpenKnowHow/okh-search/pull/42 is failing because it there is an invalid YAML file present in the projects_okhs.csv file which cannot be parsed. This would presumably impact any additional PRs as they would fail the status check.

Details

Failing build

05:13:41.176 | at resolveBlockMapItems (/vercel/178ba70d/node_modules/yaml/dist/schema/parseMap.js:234:31)
-- | --
05:13:41.176 | at Object.parseMap [as resolve] (/vercel/178ba70d/node_modules/yaml/dist/schema/parseMap.js:42:79)
05:13:41.176 | at Schema.resolveNode (/vercel/178ba70d/node_modules/yaml/dist/schema/index.js:179:27)
05:13:41.176 | at Schema.resolveNodeWithFallback (/vercel/178ba70d/node_modules/yaml/dist/schema/index.js:201:22)
05:13:41.176 | at Document.resolveNode (/vercel/178ba70d/node_modules/yaml/dist/Document.js:503:22)
05:13:41.176 | at /vercel/178ba70d/node_modules/yaml/dist/Document.js:231:26
05:13:41.176 | at Array.forEach (<anonymous>)
05:13:41.176 | at Document.parseContents (/vercel/178ba70d/node_modules/yaml/dist/Document.js:224:14)
05:13:41.176 | at Document.parse (/vercel/178ba70d/node_modules/yaml/dist/Document.js:157:10)
05:13:41.176 | at parseDocument (/vercel/178ba70d/node_modules/yaml/dist/index.js:68:37) {
05:13:41.176 | source: PlainValue {
05:13:41.176 | error: null,
05:13:41.176 | range: Range { start: 302, end: 409 },
05:13:41.176 | valueRange: Range { start: 302, end: 409 },
05:13:41.176 | props: [],
05:13:41.176 | type: 'PLAIN',
05:13:41.176 | value: null,
05:13:41.176 | resolved: Scalar {
05:13:41.176 | value: 'This Hand Pump Drill is a mechanical drill that utilizes human effort to perform the functions of a drill.',
05:13:41.176 | range: [Array],
05:13:41.176 | type: 'PLAIN'
05:13:41.176 | }
05:13:41.176 | }
05:13:41.176 | }
05:13:41.176 | --------------------------------------------
05:13:41.176 | Error reading: https://openknowhow.appropedia.org/manifests/okh-Hand%20Pump%20Drill%20(Spring-Loaded).yml
05:13:41.177 | --------------------------------------------
05:13:41.260 | (node:1424) YAMLWarning: YAML only supports %TAG and %YAML directives, and not %Open
05:13:41.262 | (node:1424) YAMLWarning: YAML only supports %TAG and %YAML directives, and not %Open
05:13:41.274 | (node:1424) YAMLWarning: YAML only supports %TAG and %YAML directives, and not %Open
05:13:41.306 | YAMLSemanticError: Implicit map keys need to be followed by map values
05:13:41.306 | at resolveBlockMapItems (/vercel/178ba70d/node_modules/yaml/dist/schema/parseMap.js:234:31)
05:13:41.306 | at Object.parseMap [as resolve] (/vercel/178ba70d/node_modules/yaml/dist/schema/parseMap.js:42:79)
05:13:41.306 | at Schema.resolveNode (/vercel/178ba70d/node_modules/yaml/dist/schema/index.js:179:27)
05:13:41.306 | at Schema.resolveNodeWithFallback (/vercel/178ba70d/node_modules/yaml/dist/schema/index.js:201:22)
05:13:41.307 | at Document.resolveNode (/vercel/178ba70d/node_modules/yaml/dist/Document.js:503:22)
05:13:41.307 | at /vercel/178ba70d/node_modules/yaml/dist/Document.js:231:26
05:13:41.307 | at Array.forEach (<anonymous>)
05:13:41.307 | at Document.parseContents (/vercel/178ba70d/node_modules/yaml/dist/Document.js:224:14)
05:13:41.307 | at Document.parse (/vercel/178ba70d/node_modules/yaml/dist/Document.js:157:10)
05:13:41.307 | at parseDocument (/vercel/178ba70d/node_modules/yaml/dist/index.js:68:37) {
05:13:41.307 | source: PlainValue {
05:13:41.307 | error: null,

Invalid file

https://github.com/OpenKnowHow/okh-search/blob/1e6d73f710f483b28bb7d3bdfd5f093df6ebf5fc/projects_okhs.csv#L198

Should be indented on Line 16

Considerations

In lieu of content-based addressing, a YAML file with a given URL could be mutated without being tested. Those mutations could introduce an invalid YAML file or a file that does not otherwise conform to the OKH standard.

Perhaps in the interim it could make sense to skip invalid YAML files and send a warning to the submitter of that file in the event that the file has changed since it was introduced to the search CSV.

kasbah commented 3 years ago

@dubsnipe I believe you generated that yaml, can you fix it? Agree that we should be more resilient to these.

kasbah commented 3 years ago

I believe we are now more resilient to invalid manifest files as I've added a number of safe-guards and fallbacks. Feel free to re-open another issue if you still see some blocking failures or any other problems.

dubsnipe commented 3 years ago

Hi, sorry for not looking at this earlier. I understand it is resolved, but do you know what was wrong with this manifest? In some cases, I believe that it has to do more with the filename (this one contained a parenthesis) than the actual content. Right now we're re-doing all of our manifests and automating the process so we'll be dealing with different issues probably.

kasbah commented 3 years ago

My honest recommendation: we move the standard to JSON as quickly as possible. YAML is too error prone for this use-case. We can try out JSON in the meantime. I made JSON manifests for Field Ready as quick test already: https://field-ready-projects.openknowhow.org/manifests/list.json. If you want to switch Appropedia to JSON as well I'm willing to read them as JSON for this search.

psprings commented 3 years ago

Ultimately, I think this is less of an issue of JSON vs YAML and more of an issue of validation. If you have a list of externally stored manifests expressed as URLs then there is a risk of mutability. At any given time there could be a change in any one of those files which breaks the build for everyone. I think you could iterate over that list and choose to exclude any "invalid" manifests from the build process.

We are keeping a few OKH files in a repository and created a validator to make sure that invalid content did not get introduced to our repository. There is a corollary GitHub Action that we use to make sure that this gets run and can be enforced differently on a PR vs merge request.

# more GitHub Action metadata here
    steps:
    # insert additional steps here
      - name: open knowledge framework validation (pull request)
        uses: helpfulengineering/ok-validate@v0.1.0
        if: ${{ github.event_name == 'pull_request' && steps.has-yaml-files.outputs.num-yaml-files != '0' }}
        with:
          file-restrictions: ${{ steps.changed-files.outputs.all }}
          path: ${{ matrix.release }}/
      - name: open knowledge framework validation
        uses: helpfulengineering/ok-validate@v0.1.0
        if: ${{ github.event_name != 'pull_request' && steps.has-yaml-files.outputs.num-yaml-files != '0' }}
        with:
          path: ${{ matrix.release }}/
kasbah commented 3 years ago

@psprings That's a great solution for people on Github and to be clear: I did make the build process for this search website more resilient to invalid manifests so I think this issue should remain closed (hence I'm closing this again @dubsnipe).

Nevertheless I think we will save ourselves a lot of trouble if we move the standard over to a less error-prone format. Even then, errors will still occur of course, and how we report back manifest errors as we flesh out a proper indexing and search is an important question that's worth exploring.