hlxsites / prisma-cloud-docs-website

blocks and gdoc authored content for https://docs.prismacloud.io
Apache License 2.0
3 stars 2 forks source link

bin/validate-doc-paths.js doesn't find all invalid paths #148

Closed iansk closed 11 months ago

iansk commented 1 year ago

The bin/validate-doc-paths.js script doesn't find all invalid paths. For example, if you look at the invalid files listed in this PR/job and compare it to the files in docs/en/prisma-cloud-ag/admin-guide/fragments/, you'll see the script fails to identify some invalid files. It appears to be an issue with the INVALID_PATH regex.

GSheet that compares files in the repo vs what the script finds: GSheet

iansk commented 1 year ago

Another repro. In the ian/fix-paths branch of the prisma-cloud-docs repo, the bin/validate-doc-paths.js script lists files that should be valid.

$ node bin/validate-doc-paths.js docs/en/prisma-cloud-ag/admin-guide/**/*.adoc
Invalid file paths:
 - docs/en/prisma-cloud-ag/admin-guide/fragments/features-at-a-glance--id89f15e0e-2831-4680-b5f5-5cfeb8627296.adoc
 - docs/en/prisma-cloud-ag/admin-guide/fragments/id3d308e0b-921e-4cac-b8fd-f5a48521aa03--idd73e7807-44d4-4bc0-b57f-97876da93ad8.adoc
 - docs/en/prisma-cloud-ag/admin-guide/fragments/idee00fe2e-51d4-4d26-b010-69f3c261ad6f--id50a63347-4291-4210-99fa-f51de04106be.adoc

*** Note: paths can only contain lowercase letters, numbers, and -
maxakuru commented 1 year ago

@iansk

  1. could you go add some examples of the paths that aren't identified?
  2. I think those are actually invalid paths, -- should fail since that will be collapsed to - by Franklin

the purpose of the path validation job is to avoid having files in the repo that, when published, will have different paths

iansk commented 1 year ago

@maxakuru Makes sense.

  1. Yes, will do.
  2. Totally agree. But then the script/regex should catch abd report all such files. In the fragments dir, there are five files with the double dash. But the script only reports three.

Files in the repo in the fragments dir with double dashes:

features-at-a-glance--id89f15e0e-2831-4680-b5f5-5cfeb8627296.adoc
id3d308e0b-921e-4cac-b8fd-f5a48521aa03--idd73e7807-44d4-4bc0-b57f-97876da93ad8.adoc
idece1e97f-31e4-4862-bc93-da79383b0392--id5b4dc25b-4887-4032-a5a4-183158c74351.adoc
idee00fe2e-51d4-4d26-b010-69f3c261ad6f--id50a63347-4291-4210-99fa-f51de04106be.adoc
idee00fe2e-51d4-4d26-b010-69f3c261ad6f--id82a563a3-ea83-444d-a6ab-f1f8b5e116d8.adoc

Files reported as invalid by the script:

features-at-a-glance--id89f15e0e-2831-4680-b5f5-5cfeb8627296.adoc
id3d308e0b-921e-4cac-b8fd-f5a48521aa03--idd73e7807-44d4-4bc0-b57f-97876da93ad8.adoc
idee00fe2e-51d4-4d26-b010-69f3c261ad6f--id50a63347-4291-4210-99fa-f51de04106be.adoc
iansk commented 1 year ago

Here's another example of the script returning weird results.

  1. Create a dir structure like this:
├── bin
│   └── validate-doc-paths.mjs
└── docs
    ├── fragments
    │   └── id2eac1406-00df-4530-bcc7-dfa1795d6e4a__iddf0edb02-009c-4780-8bdb-f22c30459d96.adoc
    ├── id2eac1406-00df-4530-bcc7-dfa1795d6e4a__iddf0edb02-009c-4780-8bdb-f22c30459d96.adoc
    ├── test-test.adoc
    ├── test.adoc
    └── test_test.adoc
  1. Run the script: node bin/validate-doc-paths.mjs docs/**/*.adoc

  2. Review the results:

Invalid file paths:
 - docs/fragments/id2eac1406-00df-4530-bcc7-dfa1795d6e4a__iddf0edb02-009c-4780-8bdb-f22c30459d96.adoc
 - docs/test_test.adoc

Notice that the script returns docs/fragments/id2eac1406-00df-4530-bcc7-dfa1795d6e4a__iddf0edb02-009c-4780-8bdb-f22c30459d96.adoc, but not docs/id2eac1406-00df-4530-bcc7-dfa1795d6e4a__iddf0edb02-009c-4780-8bdb-f22c30459d96.adoc. They both have the same file name, but located in different directories. I'd expect the script to return both files.

iansk commented 11 months ago

@maxakuru Max, last time I look at this, I still saw the weird behavior