pp-mo / ugrid-checks

UGRID file format compliance checking
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Common source for conventions 'sentences' and checking code? #38

Closed trexfeathers closed 2 years ago

trexfeathers commented 2 years ago

I know this would need some heavier architecture, but the value makes it worth the effort IMO.

I suggest some sort of common class structure where each bullet in the conformance docs page is stored in the same place as the code that checks for it - a single source of truth that's must less likely to diverge.

I imagine it's definitely possible for docs rendering to incorporate some code that extracts what it needs from said classes.

pp-mo commented 2 years ago

Well, my instant response is that it's too hard !! But apart from that, I do have some objections in principle...

  1. I think that the conformance statements logically precede the checker code, which must conform to them. We really don't want to make that a "circular dependency" : the conformance page is (will be) part of the spec, but the checker stands outside it.
  2. the way the conformance pages are written involves a good deal of structure above + beyond the individual statements. Notably, the sections, the order of presentation and footnotes. And the way they are written is designed to read in flow, e.g. "Mesh coordinate variables / For any mesh coordinate variable .. / Mesh coordinate variable requirements / R201 it must map a single dimension / R202 which is the appropriate element dimension of the parent mesh".
    I'd really hate to lose this approach, as I thought it added a lot of readability
  3. in fact, not all conformance statements are actually programatically checkable, though I tried hard : The exception being A901, which basically says that UGRUD should be CF-compliant (though that is the only instance, for now at least).

So, addressing (1.) above, we could perhaps derive the code messages from the conformance page source, instead of the other way around?
-- but there again, the contextual way they are written to reduce repetition (as in point (2.)) makes that not really viable IMHO.

In practical fact there are (sort-of) three things relating :

But I didn't manage to 'automate' any of those links -- they are all coded manually.
And the "flow-structure" of both the conformance statements and that of the checker code itself seem to work against it. There isn't always a 1-2-1 relationship, since some codes are checked+raised in more then one place (e.g. here + here), while other code sections address multiple codes (e.g. here).

However there is clearly still a near 1-2-1 relationship.
I hope the one-way relationship is clear from the existing code, to the conformance statements.

But I totally see what you mean about maintaining consistency in a more managed way. I'm just not seeing a way to do it, just now ...

trexfeathers commented 2 years ago

Thanks for the explanation!

Whenever it turns out to be unrealistic to remove duplication, I find the next best thing is to find ways to make sure people remember to make a change in all the places. So once the conformance statements and the checker have been established, I might propose a PR template that reminds people proposing conformance changes to also confirm whether the checker needs updating.

pp-mo commented 2 years ago

I totally see what you mean about maintaining consistency in a more managed way. I'm just not seeing a way to do it, just now ...

Perhaps we could include references in the checker which can be actively checked (in CI) against the content of the conformance rules ? I did already wonder about having a list of rules statements within the checker, which you could output as a listing given a control keyword (e.g. "ugrid-checker --rules").
So I wonder if we could combine those 2 in some way -- but we do want a form that can possibly be tied into the code itself. Maybe something like ...

all_codes = {
    'A101': ('mesh',  'should have no dimensions'),
    'A102': ('mesh', 'should not have a `standard_name` attribute'),
    'R202': ('mesh-coordinate',  'is the appropriate element dimension of the parent mesh'),
 }
 . . .
         if "standard_name" in meshvar.attributes:
            log_meshvar("A102", "has a 'standard_name' attribute.", 'should not have a `standard_name` attribute')

It would mean some repetition, but the extra "statement strings" can tie it together programmatically.