ioccc-src / temp-test-ioccc

Temporary test IOCCC web site that will go away
Creative Commons Attribution Share Alike 4.0 International
42 stars 6 forks source link

Enhancement: hack chkentry(1) to process (validate) .entry.json and author_handle.json files #2614

Closed xexyl closed 2 weeks ago

xexyl commented 1 month ago

Is there an existing issue for this?

Describe the feature

Currently chkentry(1) validates the .info.json and .auth.json files (in two ways: specifying a directory name or the individual files). It has been determined through discussion that the directory name (single arg) form needs to validate instead the .entry.json and the author/Author_Handle.json files. To validate the .info.json and .auth.json files one must use the two arg form. I have done this on a branch but no commits yet.

This issue is to discuss the finer details of this important change.

It might be helpful if you add an addendum to this comment (below this - keeping this in for context) the details you need OR making a comment with the details and then adding a link to that comment in this comment.

Relevant images, screenshots or other files

No response

Relevant links

chkentry(1) can be found in the mkiocccentry repo.

Anything else?

Did you know that a team of astronomers spent four YEARS to determine the indeterminable meaning behind 'life, the universe and everything' only to come up with the same answer that Deep Thought after many years of working on it - only to come up with the nonsensical and meaningLESS value of 42? Indeed it is true. Of course many people seem to also actually believe this was serious (look at google for 'proof that 42 is the answer to life, the universe and everything' - some are quite amusing!) despite the fact that Douglas Adams said flat out on USENET that it was a JOKE.

BTW: did you know that the Earth is Flat ? :-) (to those who miss irony and jokes: obviously the Earth is NOT flat and anyone who believes it is very wrong; indeed if the Earth really was Flat then cats would have pushed everything off it by now - to say nothing else of all the other evidence including the fact that the Soviets would have done ABSOLUTELY ANYTHING to discredit the Americans .. and they did not).

Okay that's enough for now! :-)

UPDATES

See comment 2269551230 for requirements of the changes.

xexyl commented 1 month ago

Please assign this to me. As I am working on getting to the fork merge issue (largely in part by completing #2006) I can think on this and also work on it here and there (as I did yesterday).

--

I have spent a couple minutes figuring out (I think) how to work on a development branch and later merge it into master though as I always push non master branches I'm not sure how that works yet with GitHub. Still it might be enough as long as (I think this is required) I commit - before pulling new changes that is (but I might have to merge at that point? Not sure how that works as if anything is modified then obviously there is a problem and potentially a big one).

On the other hand maybe I could commit the things I did yesterday: but perhaps not as you want to make it so that minimal changes are made. Anyway is it time to open a new issue? If so should it be here or at the other repo?

--

xexyl commented 1 month ago

FAQ and chkentry: current inconsistencies

Of course to make the FAQ consistent with the new command line form it might be good to update the command line parsing at least even if nothing else should be done (though adding structs and #defined versions might be a good idea - for the two files). What do you think about these things (if nothing else the command line parsing of chkentry(1)?

I hope to do a bit here today but I'm not sure. I was planning on getting on the exercise bike but I have a kitty here so waiting until he leaves. But I'll be leaving probably in a couple or three hours.

xexyl commented 1 month ago

It seems like the command line parsing should instead have a single arg (directory) still but it should call a stub function for .entry.json and author_handle.json files. I could then update the FAQ to say how to validate a .info.json and/or a .auth.json file.

That makes it easier. What do you wish me to do?

xexyl commented 1 month ago

Thanks for assigning this to me! I have made some nice progress: just checking make prep. If all good I'll look at the next year here until I have to leave. I'm hoping soon to be where I can not move forwards on this issue until much more is discussed but I wanted to get this part in anyway. If anything does not work with make prep my guess would be it's the way the test scripts invoke the chkentry: as in maybe it has a form that uses just the directory and not the filenames directly (path or otherwise).

xexyl commented 1 month ago

Running make prep is successful: all tests passed!

I found an odd thing though .. checking if it's actually the Makefile as it seems. If so I'll do a pull request on that. Then if I have time I'll look at 2001 (I guess it is).

lcn2 commented 1 month ago

It seems like the command line parsing should instead have a single arg (directory) still but it should call a stub function for .entry.json and author_handle.json files. I could then update the FAQ to say how to validate a .info.json and/or a .auth.json file.

That makes it easier. What do you wish me to do?

We don't want to imply that if .info.json and/or a .auth.json files exist, that chkentry(1) will be processing them. When a submission is transitioning into an IOCCC winner, it will have .info.json, and .auth.json, as well as possibly a .entry.json being formed (i.e., exists but is incomplete, or may be empty or may be missing).

We suggested a change in behavior on the chkentry(1) command line.

To process an entry .entry.json file (and its associated author/author_handle.json files(s)):

./chkentry YYYY/dir

For example:

./chkentry 2020/ferguson1

To check just .info.json:

./chkentry some/path/.info.json .

To check just .auth.json:

./chkentry . some/path/.auth.json

To check .info.json, and .auth.json:

./chkentry some/path/.info.json some/path/.auth.json

To have "undocumented fun" 🤩:

./chkentry . .

Remember that a submission directory (with .info.json, and .auth.json files) can be anywhere including NOT under an IOCCC winner repo directory tree because submissions will be judged elsewhere (perhaps far away from any repo tree).

Remember that a submission directory could have .info.json, and .auth.json files along with a .entry.json file, and that the .entry.json file may not be valid JSON because the .entry.json file may be in the process of being formed.

The chkentry(1) tool CANNOT assume that the existence of a .entry.json file means that it is an IOCCC entry within the winner repo with associated author/author_handle.json files(s) (see previous line).

The chkentry(1) tool CANNOT assume that the existence of .info.json, and .auth.json files means that the directory is a submission directory because the .entry.json file may be in the process of being formed.

If the chkentry(1) tool is directed via the command line to validate .info.json and/or a .auth.json files, it cannot assume anything about the paths of those JSON files.

The chkentry(1) tool CANNOT assume the filename form of a .info.json or .auth.json files as they could be called curds and whey and even be in different directories:

./chkentry /tmp/curds /var/tmp/whey

If the chkentry(1) tool is directed via the command line to validate a .entry.json file, it can assume that the directory path is of the form YYYY/dir AND that the top level author directory exists, AND that the JSON file is called .entry.json and is located in YYYY/dir/.entry.json.

There will be a tool (such as bin/all-run.sh in that other repo) that will want to call the chkentry(1) tool with just a YYYY/dir directory argument that will expect the chkentry(1) tool to process the YYYY/dir/.entry.json file and the associated author/author_handle.json files(s) AND that the tool will call the chkentry(1) tool from the top of the repo directory tree.

The above line is one reason why we suggest that by default, the chkentry(1) tool print nothing and exit 0 when the JSON file(s) are OK (valid JSON and are semantically valid for their context) (error and warning messages, of course, are welcome when problems are found).

So this is why we suggest these command line forms:

./chkentry some/path/.info.json .
./chkentry . some/path/.auth.json
./chkentry some/path/.info.json some/path/.auth.json
./chkentry /tmp/curds /var/tmp/whey
./chkentry YYYY/dir

This is why we are suggesting that a "one arg" chkentry(1) command line imply that the "one arg" is a YYYY/dir directory with a repo, and with a YYYY/dir/.entry.json file, and with the associated author/author_handle.json file processing (no .info.json and/or a .auth.json processing).

This is why we are suggesting that a "two arg" chkentry(1) command line imply .info.json and/or a .auth.json processing (no .entry.json processing).

xexyl commented 1 month ago

I believe that I have that covered at least as far as parsing the command line.

It also even checks for the author directory and of the .entry.json file can be opened. That one is even parsed though no walking through the tree yet. That's why author/ is only checked as a directory: since the other file has to be analysed before it can come up with the correct filenames.

I will have to make sure that everything you mentioned above is how I have it when I am not looking at this with the phone but it seems like I have done that all.

Thanks. I will not likely do more today on it and I might not be able to do anything here either but it's good progress over there anyway.

lcn2 commented 1 month ago

I believe that I have that covered at least as far as parsing the command line.

It also even checks for the author directory and of the .entry.json file can be opened. That one is even parsed though no walking through the tree yet. That's why author/ is only checked as a directory: since the other file has to be analysed before it can come up with the correct filenames.

You are probably correct.

The change to chkentry(1) tool behavior is in the one arg form. Now:

./chkentry arg

will require and arg of the form YYYY/dir and will imply YYYY/dir/.entry.json file, and with the associated author/author_handle.json file processing and with NO .info.json NOR .auth.json processing.

xexyl commented 1 month ago

I believe that I have that covered at least as far as parsing the command line. It also even checks for the author directory and of the .entry.json file can be opened. That one is even parsed though no walking through the tree yet. That's why author/ is only checked as a directory: since the other file has to be analysed before it can come up with the correct filenames.

You are probably correct.

The change to chkentry(1) tool behavior is in the one arg form. Now:

./chkentry arg

will require and arg of the form YYYY/dir and will imply YYYY/dir/.entry.json file, and with the associated author/author_handle.json file processing and with NO .info.json NOR .auth.json processing.

Yes .. does this all. The next step would be to walk the tree and validate it, making sure to saving all the author ids: and then checking those author_handle.json files.

Now I'm unsure of what to do for the FAQ though. I can't commit these changes as there is too much done but we also want the FAQ to say that there's only one way to check the .auth.json and .info.json files. What do you propose I do?

... and I am sorry but I have very little time left. I won't be able to work on this repo today. I did get a fair amount done, as said, with chkentry, and I did make a fix in the Makefile (well two but same problem) over there but beyond that I'll be leaving shortly. So with that I'll bid you a good day!

lcn2 commented 1 month ago

Now I'm unsure of what to do for the FAQ though. I can't commit these changes as there is too much done but we also want the FAQ to say that there's only one way to check the .auth.json and .info.json files. What do you propose I do?

Fix the FAQs once you have the code working and tested.

xexyl commented 1 month ago

Now I'm unsure of what to do for the FAQ though. I can't commit these changes as there is too much done but we also want the FAQ to say that there's only one way to check the .auth.json and .info.json files. What do you propose I do?

Fix the FAQs once you have the code working and tested.

Sure.

xexyl commented 1 month ago

Question

How did you generate the soup/chk.* files? I guess this would be something I need to do for this change. I know there's more to it but it might be helpful if I have an idea how you came up with it. Or if you have an idea that might make it easier to update (as it is indeed a pain as it stands now - I remember having to make fixes before) then I'm definitely willing to read what you have to say.

xexyl commented 1 month ago

Hmm ... to help with determining the number of authors in .entry.json files would it be possible (like with the other json files) to have an author_count field in the .entry.json files?

I think that would be helpful but maybe I am (at this time which I have to go afk a bit) missing something. Still I have made even more progress here. When back I'll either look a bit more at this or else I will look at #2006. Just test compiling what I have and then go afk a bit.

xexyl commented 1 month ago

I can add the author_count btw if that's helpful though it will take a bit of time to do ...

xexyl commented 1 month ago

Ah! Maybe I can do something like this instead:

    array = &(node->item.array);
    array_len = array->len;

and that way I can not need the author_count ... well anyway I'll be afk a while.

xexyl commented 1 month ago

On the other hand ... maybe it would be a good idea as an extra sanity check: if the author count != array size it would be an error. Also it would match the .auth.json file (not in being verified as those are verified at different times but match it in format: for authors it has an author count).

What do you think? It might not be worth the effort but since it popped into my head (a bit ago - going afk again) I figured I'd bring it up.

xexyl commented 1 month ago

Question

The checking of .info.json checks the manifest but in that file it has just the id name and the filename. It checks that required files are there, that there are no duplicates of any files (required or not) and that no required files are identified a something else (I think that was done - been a while and haven't checked).

Now the manifest in .entry.json has more fields. The question is: should it have the above .info.json checks too? My suspicion is that it does need to have them. That will of course slow things down but that I think is important.

I will be going afk again soon (I have been back but addressing other things). When back I have other things still to do but depending on the time I will try and look at 2001. Right now I will start that but I will be leaving any moment and it might be that I don't even have time to start that. I know it's been slow with this repo but I have still been doing good on the other one at least.

lcn2 commented 1 month ago

Question

How did you generate the soup/chk.* files? I guess this would be something I need to do for this change. I know there's more to it but it might be helpful if I have an idea how you came up with it. Or if you have an idea that might make it easier to update (as it is indeed a pain as it stands now - I remember having to make fixes before) then I'm definitely willing to read what you have to say.

The last time there was a chance, we hand edited the patch. That's not good and so needs to change so that a known reference can be used as a model. Currently one can be generated based on a reference file and that result is patched so that some things such as the allows count range min and max are adjusted. See

https://github.com/ioccc-src/mkiocccentry/blob/master/soup/chk.auth.ptch.c

That might have been fine for development, but it is a bit of pain should a format need to change. So there might need to be a better was to specify the range count (instead of the current soup/chk.* stuff) may be needed.

xexyl commented 1 month ago

Question

How did you generate the soup/chk.* files? I guess this would be something I need to do for this change. I know there's more to it but it might be helpful if I have an idea how you came up with it. Or if you have an idea that might make it easier to update (as it is indeed a pain as it stands now - I remember having to make fixes before) then I'm definitely willing to read what you have to say.

The last time there was a chance, we hand edited the patch. That's not good and so needs to change so that a known reference can be used as a model. Currently one can be generated based on a reference file and that result is patched so that some things such as the allows count range min and max are adjusted. See

https://github.com/ioccc-src/mkiocccentry/blob/master/soup/chk.auth.ptch.c

That might have been fine for development, but it is a bit of pain should a format need to change. So there might need to be a better was to specify the range count (instead of the current soup/chk.* stuff) may be needed.

I had to hand edit it last time as well: not good. But do you have any suggested alternatives? I know that there are make rules that generate some of those files but that still requires a starting file (or actually iirc quite a few files).

I guess the first step is to figure out what it has to do. Let's see. It has ....

depth    type        min     max   count   index  name_len validate  name

Let's define these.

The depth is what? Is it the expected depth? I cannot recall.

The type is the JTYPE_*. Simple enough.

The min is the minimum number of times the field has to be there.

The max is the maximum number of times it is allowed to be there.

The count is how many times it is there.

The index I can't recall. It seems like the order it's parsed but is that the case? How necessary is that even?

The validate is what function validates it or NULL if not needed.

Finally the name is the name of the JSON field (the string before the :).

I think before we can come up with a different format (if that's possible and if that's the way to go about it - which is not clear to me yet though I'm curious what you think too[0]) we have to do the following:

  1. Get a clear definition of what each one is and its purpose.
  2. Decide which ones are necessary (maybe they all are but as I don't remember the purpose of all of them I cannot say and so I'm going to bring this up - and anyway maybe a different model would mean we would not need some or need additional ones even).

After those two we can figure out what might be better.

I suspect I will be leaving soonish but I did get the other things I needed to do done so I am going to start 2001 if I can.

[0] Of course many different things would be better than to have to manually edit a patch file but I use the word format quite loosely there.

lcn2 commented 1 month ago

Hmm ... to help with determining the number of authors in .entry.json files would it be possible (like with the other json files) to have an author_count field in the .entry.json files?

One should be careful when one "over specifies" something: In this case adding a count of authors instead of just counting the authors.

Over specification of something allows for multiple ways for something to be wrong and adds to the number of checks one has to make. So adding or removing and author from the set of authors would be complicated by now also having to adjust the author count number. And then the length of the list of authors now has to be compared with the author count. Adding a count of authors doesn't provide and missing information and create opportunities for something to be wrong.

Yes, when forming .auth.json we had author count and even author numbers. But that was a very different situation. The mkioccentry(1) tool has to ask the user for the number of authors and then count them to be sure all the authors have been provided by the user and then verify the author set provided by the user is complete.

Here, with .entry.json we have no such "what did the user tell us" problem. The list of authors are the list. In fact we assume that the contents of the .entry.json file is "authoritative".

It is a similar reason why the manifest in the .entry.json file does not ask for the file size of a given file. We assume that a file such as prog.c is the correct size. Adding a "file size" property to the manifest would complete the JSON and create more opportunities for something to be wrong.

But you might say: "how do we know if the list of authors is incomplete?" The answer is we let the humans decide. If a winner were posted with an author missing or incorrectly added to an entry, we would count on the humans noticing the problem and correcting it with a pull request. Adding an author count to the JSON wouldn't help detect a missing or extra author to an entry because the overall author count could also be wrong.

The same way we don't detect if a prog.c file is the wrong size. We let the humans discover that fact and fix missing or extra code. Adding a file length to the manifest doesn't help detect if a prog.c file is the wrong size because such a "file size" value could be also wrong. And worse still, adding such a "file size" value just creates more opportunities for errors.

So in general, over specifying information is not helpful. It creates opportunities for redundant information to disagree with itself, requires even more consistency checks, complicates the overall JSON structure, and doesn't really help detect when content such as an author is missing or extra.

xexyl commented 1 month ago

Hmm ... to help with determining the number of authors in .entry.json files would it be possible (like with the other json files) to have an author_count field in the .entry.json files?

One should be careful when one "over specifies" something: In this case adding a count of authors instead of just counting the authors.

Over specification of something allows for multiple ways for something to be wrong and adds to the number of checks one has to make. So adding or removing and author from the set of authors would be complicated by now also having to adjust the author count number. And then the length of the list of authors now has to be compared with the author count. Adding a count of authors doesn't provide and missing information and create opportunities for something to be wrong.

Yes, when forming .auth.json we had author count and even author numbers. But that was a very different situation. The mkioccentry(1) tool has to ask the user for the number of authors and then count them to be sure all the authors have been provided by the user and then verify the author set provided by the user is complete.

Here, with .entry.json we have no such "what did the user tell us" problem. The list of authors are the list. In fact we assume that the contents of the .entry.json file is "authoritative".

It is a similar reason why the manifest in the .entry.json file does not ask for the file size of a given file. We assume that a file such as prog.c is the correct size. Adding a "file size" property to the manifest would complete the JSON and create more opportunities for something to be wrong.

But you might say: "how do we know if the list of authors is incomplete?" The answer is we let the humans decide. If a winner were posted with an author missing or incorrectly added to an entry, we would count on the humans noticing the problem and correcting it with a pull request. Adding an author count to the JSON wouldn't help detect a missing or extra author to an entry because the overall author count could also be wrong.

The same way we don't detect if a prog.c file is the wrong size. We let the humans discover that fact and fix missing or extra code. Adding a file length to the manifest doesn't help detect if a prog.c file is the wrong size because such a "file size" value could be also wrong. And worse still, adding such a "file size" value just creates more opportunities for errors.

So in general, over specifying information is not helpful. It creates opportunities for redundant information to disagree with itself, requires even more consistency checks, complicates the overall JSON structure, and doesn't really help detect when content such as an author is missing or extra.

Oh I agree. But since it popped into my head (mostly as I saw it in the other place) I figured I'd mention it. Sure it would add another layer of sanity checks but it's also as you say not really useful. Also you bring up other points where the results could change.

lcn2 commented 1 month ago

The checking of .info.json checks the manifest but in that file it has just the id name and the filename. It checks that required files are there, that there are no duplicates of any files (required or not) and that no required files are identified a something else (I think that was done - been a while and haven't checked).

If the manifest of a .entry.json is missing a required file (such as Makefile, README.md, index.html, etc.) then that needs to be flagged as an error.

While a weeeeeee tiny bit near the edge of semantic checking, if the actual file is missing in the directory, of if it the directory has a file not listed in the manifest, that should be flagged as an error too.

xexyl commented 1 month ago

The checking of .info.json checks the manifest but in that file it has just the id name and the filename. It checks that required files are there, that there are no duplicates of any files (required or not) and that no required files are identified a something else (I think that was done - been a while and haven't checked).

If the manifest of a .entry.json is missing a required file (such as Makefile, README.md, index.html, etc.) then that needs to be flagged as an error.

I guessed this. Thanks for confirming. I wonder if I can somehow use (in part) the checks that already exist for that part. I will have to think about that.

While a weeeeeee tiny bit near the edge of semantic checking, if the actual file is missing in the directory, of if it the directory has a file not listed in the manifest, that should be flagged as an error too.

Interesting thought, that.

lcn2 commented 1 month ago

While a weeeeeee tiny bit near the edge of semantic checking, if the actual file is missing in the directory, of if it the directory has a file not listed in the manifest, that should be flagged as an error too. Interesting thought, that.

The need for this can be argued this way: the semantics of a field manifest depends, in part, on matching the files under the YYYY/dir directory.

Assume that make clobber has already been done. It someone forgets to make clobber first, the restating errors will be obvious and in the "grand scheme of the Makefile rules" assume make clobber was already done before bin/all-run executes chkentry YYYY/dir for all entire.

FYI: The tool that the judges use to help convert a submission into an entry will use chkentry(1) in the "one argument" (I.e., YYYY/dir) mode to help verify that the willing entry being formed is "sane and semantically valid". If the judges forget to remove either .info.json and/or .auth.json, the file manifest checks will differ from the list of files under the YYYY/dir directory.

It probably would hurt chkentry(1), when it is attempt to verify that mandatory files (such as Makefile, README.md, index.html, .path, .entry.json, etc.) exist: also verify that certain entry files no longer exist. That is, the following files should NOT be found in an entry: neither in the .entry.json manifest nor in the directory:

This will catch the case where the judges, when converting a submission on a winning entry, forget to remove those 3 files.

xexyl commented 1 month ago

While a weeeeeee tiny bit near the edge of semantic checking, if the actual file is missing in the directory, of if it the directory has a file not listed in the manifest, that should be flagged as an error too. Interesting thought, that.

The need for this can be argued this way: the semantics of a field manifest depends, in part, on matching the files under the YYYY/dir directory.

Assume that make clobber has already been done. It someone forgets to make clobber first, the restating errors will be obvious and in the "grand scheme of the Makefile rules" assume make clobber was already done before bin/all-run executes chkentry YYYY/dir for all entire.

FYI: The tool that the judges use to help convert a submission into an entry will use chkentry(1) in the "one argument" (I.e., YYYY/dir) mode to help verify that the willing entry being formed is "sane and semantically valid". If the judges forget to remove either .info.json and/or .auth.json, the file manifest checks will differ from the list of files under the YYYY/dir directory.

It probably would hurt chkentry(1), when it is attempt to verify that mandatory files (such as Makefile, README.md, index.html, .path, .entry.json, etc.) exist: also verify that certain entry files no longer exist. That is, the following files should NOT be found in an entry: neither in the .entry.json manifest nor in the directory:

  • .info.json
  • .auth.json
  • remarks.md

This will catch the case where the judges, when converting a submission on a winning entry, forget to remove those 3 files.

By this do you mean that if they are found it is an error?

xexyl commented 1 month ago

As far as the patch files what about what I wrote in comment 2267622035?

I had a crazy thought too. I don't know how much of it needs to be in C. I guess (didn't look) that the reason it's a C file is it uses the function (in the struct) as a callback but even if that is necessary one way that might (and this has not really been thought out well!) make it a bit easier to modify (and remember here too I don't remember or know the purpose of all the fields so that might be something too - in other words take this with a pinch of salt) is to have it in a json file. That is the things like min, max and other requirements could be in a json file and then if there has to be some C it can have fewer fields (maybe the function, the string name and the filename of the JSON file?).

Just a thought that popped into my head ... no idea how valid or feasible or useful it might be.

xexyl commented 1 month ago

Okay taking a quick look I do indeed see what I suspected - it is a callback function. It seems that sem_walk() will call that function too if it is not NULL. So that would be a reason we need it to be in C.

I do not know the purpose of having to have a patch file, however, and that might be helpful to know.

The json file idea for other details is maybe iffy: it did pop into my head but without looking at it in details. But depending on how the patch file is used (and why it is used) it might be that we could do something else? What that 'something else' is however, I do not know yet.

xexyl commented 1 month ago

Obviously struct json_sem is the important one and that is used in json_sem.{h,c}. That file is necessary to analyse and consider then. Still not sure on the patch file necessity though (by which I mean I don't know why it is necessary but I'm sure it is - at least at this time).

It seems clearer to me now that if possible a json file might be nicer as it's easier to edit but of course the fact that these members of the struct are being used might make that harder to do: not sure. The point though is that if we do not have to manually edit patch files that would be extremely beneficial. I remember it being a pain to do last time and it seems like you had that same problem.

I'll let you ponder these comments and we can go from there. I was hoping to look at a few entries of 2001 now but it seems like I might not be able to. I hope to in a bit but I'm not sure yet. Tomorrow though good news is I'll have more time here at the computer so I can either look at this more or hopefully get a few years (html files) done.

lcn2 commented 1 month ago

Now the manifest in .entry.json has more fields. The question is: should it have the above .info.json checks too? My suspicion is that it does need to have them. That will of course slow things down but that I think is important.

Yes, .entry.json will need similar checks suitable for an entry.

lcn2 commented 1 month ago

The depth is what? Is it the expected depth? I cannot recall.

The level in the JSON parse tree, relative to the root.

xexyl commented 1 month ago

Now the manifest in .entry.json has more fields. The question is: should it have the above .info.json checks too? My suspicion is that it does need to have them. That will of course slow things down but that I think is important.

Yes, .entry.json will need similar checks suitable for an entry.

Hopefully (after updating the way the checks are done) the checks for .info.json can be adapted to this file too.

xexyl commented 1 month ago

The depth is what? Is it the expected depth? I cannot recall.

The level in the JSON parse tree, relative to the root.

In other words it is the required level in the parse tree for the field?

lcn2 commented 1 month ago

I think before we can come up with a different format (if that's possible and if that's the way to go about it - which is not clear to me yet though I'm curious what you think too[0]) we have to do the following:

Get a clear definition of what each one is and its purpose. Decide which ones are necessary (maybe they all are but as I don't remember the purpose of all of them I cannot say and so I'm going to bring this up - and anyway maybe a different model would mean we would not need some or need additional ones even).

Yes: study the code and read comments. You might do well to start with the data structure definitions. Look at the build process including Make rules and any utilities (such as all_sem_ref.sh). And read the code too. 🤓

lcn2 commented 1 month ago

.info.json .auth.json remarks.md This will catch the case where the judges, when converting a submission on a winning entry, forget to remove those 3 files.

By this do you mean that if they are found it is an error?

Might be nicer to issue a warning ⚠️ in these cases.

xexyl commented 1 month ago

I think before we can come up with a different format (if that's possible and if that's the way to go about it - which is not clear to me yet though I'm curious what you think too[0]) we have to do the following: Get a clear definition of what each one is and its purpose. Decide which ones are necessary (maybe they all are but as I don't remember the purpose of all of them I cannot say and so I'm going to bring this up - and anyway maybe a different model would mean we would not need some or need additional ones even).

Yes: study the code and read comments. You might do well to start with the data structure definitions. Look at the build process including Make rules and any utilities (such as all_sem_ref.sh). And read the code too. 🤓

Sure but this is to determine (together) what might be a better way to manage it than a patch file. I did give one thought (though as noted there are some problems with it) but I guess discussion would be a good idea here.

xexyl commented 1 month ago

.info.json .auth.json remarks.md This will catch the case where the judges, when converting a submission on a winning entry, forget to remove those 3 files.

By this do you mean that if they are found it is an error?

Might be nicer to issue a warning ⚠️ in these cases.

Sure.

lcn2 commented 1 month ago

As far as the patch files what about what I wrote in comment 2267622035?

If by that you ask to have the code, with lots of comments explained in GitHub comments? While we don't think you mean that, we just want to be sure you don't as the effort need to re-explain code comments and algorithms in GitHub comments would be larger than the effort to just write the tool.

In terms of requirements: the single argument chkentry(1) tool needs to determine if the information found in an IOCCC entry's .entry.json file and its related author/author_handle.json file(s) are consistent, complete and reflects the contents of the IOCCC entry.

If you walk thru a file such as 1984/mullender/.entry.json you will see a number of JSON members. The code needs to determine if each of those JSON members are present, have the proper JSON member value type and in some cases the value is reasonable. The year as an integer value >= 1984 and <= say 9999. The entry_id is a JSON string whose value is consistent with the YYYY/dir path (i.e., is YYYY_dir). An entry_text is a JSON string whose lengths are within a specified range (i.e., longer than 0 and shorted than some specified limit). Etc.

We suggest you approach this from a testing perspective. How might someone try to mess up a .entry.json file? Does it contain extra stuff that is unexpected? Does it have something omitted? Is a given JSON member value the wrong type (for example, an integer when it should be a string, a null when it should be an integer, etc.)? Is a given JSON member that does have a JSON value of the correct time have an "insane" value (for example, an empty string, a string that is too long, a negative integer when a positive integer is needed, a string that contains bogus characters when it should be a string value of a well defined type, etc.)? An .entry.json file should describe a given IOCCC entry, but does it describe correctly (for example, the year doesn't match, the file manifest does not match, the authorship doesn't have a related author/author_handle.json file, etc.).

We won't go into every JSON member element in GitHub comments. You will have to figure that out yourself. In some cases you may even need to add a #define SOMETHING_MAXLEN if there isn't one that you needed.

BTW: The single argument chkentry(1) tool should start out with determine that the IOCCC entry's .entry.json file and its related author/author_handle.json file(s) are valid JSON. If they are not valid JSON then throw an error and exit. However if they are valid JSON, you need to ask of the files are reasonable have don't contain mistakes, extra unwanted stuff, omissions, bogus values, etc.

Think of a ways to mess up a .entry.json file with respect to the entry directory it is in, and be sure that the single argument chkentry(1) tool can detect that. Think about a given .entry.json file that is put into the wrong IOCCC entry and how to detect that. Think about someone who edits a given .entry.json file and makes mistakes, that while are valid JSON, doesn't properly describe the IOCCC entry, or contains values that are out of range, or JSON member values that are the wrong type, and detect that.

Some high level requrements

When the single argument chkentry(1) tool is given a well formed IOCCC entry its related author/author_handle.json file(s), does it pass (i.e., exit 0).

When a mistake is made in the contests of the .entry.json file its related author/author_handle.json file(s), even through they may be valid JSON files, are those mistakes detected?

By "mistakes made" we include things such as the wrong type of JSON value, or a JSON value that is the correct type but whose value is not proper, out of range, unreasonable, or is inconsistent with the IOCCC entry it is within, etc.

When someone modifies an IOCCC entry but fails to update the contests of the .entry.json file its related author/author_handle.json file(s), is that detected?

And by detected we mean warnings or errors are issued on stderr and the tool does not exit with a 0 exit code.

UPDATE 0a

Think of ways that a .entry.json file (within a given YYYY/dir) and/or its related author/author_handle.json file(s) might be messed up (unintentional or deliberate), or ways that an IOCCC entry might be changed (again, unintentional or deliberate) without making the proper changes to the .entry.jsonfile its relatedauthor/author_handle.json` file(s). Detect those situations, issue warning and errors to stderr as needed, and exit non-zero.

lcn2 commented 1 month ago

Okay taking a quick look I do indeed see what I suspected - it is a callback function. It seems that sem_walk() will call that function too if it is not NULL. So that would be a reason we need it to be in C.

I do not know the purpose of having to have a patch file, however, and that might be helpful to know.

The json file idea for other details is maybe iffy: it did pop into my head but without looking at it in details. But depending on how the patch file is used (and why it is used) it might be that we could do something else? What that 'something else' is however, I do not know yet.

While one can dump a semantic table for a reference .entry.json file, other perfectly valid .entry.json files might have more of a given thing (for example, more or fewer extra files), so the range of a given type of JSON element at a given level might be different than what we could for the reference .entry.json file. So the table is patched to account for those valid differences.

xexyl commented 1 month ago

As far as the patch files what about what I wrote in comment 2267622035?

If by that you ask to have the code, with lots of comments explained in GitHub comments? While we don't think you mean that, we just want to be sure you don't as the effort need to re-explain code comments and algorithms in GitHub comments would be larger than the effort to just write the tool.

I will have to reply to the rest another time - probably not until tomorrow. But I did want to reply to this.

What I meant actually is we need to figure out a better approach as both of us have observed it is a pain to have to modify the patch file manually. Do you have any ideas of what we might use instead? Nothing comes to mind yet but I also have to figure out why the patch file is needed too. I know it has a table of a struct for each json field/variable needed in the respective file and that might be why it has to be C. But surely there can be something else than a patch file that would simplify it?

What might be helpful is if you told me how you created the table (not the patch file but the original file) in the first place. Or did you do this by hand? If that's the case I congratulate you on that (it's something I would not want to have to do) and in that case I would have to look in more detail at json_sem. If however you managed to do it programmatically that might help (of course it might not too).

lcn2 commented 1 month ago

The depth is what? Is it the expected depth? I cannot recall.

The level in the JSON parse tree, relative to the root.

In other words it is the required level in the parse tree for the field?

Yes.

xexyl commented 1 month ago

Okay taking a quick look I do indeed see what I suspected - it is a callback function. It seems that sem_walk() will call that function too if it is not NULL. So that would be a reason we need it to be in C. I do not know the purpose of having to have a patch file, however, and that might be helpful to know. The json file idea for other details is maybe iffy: it did pop into my head but without looking at it in details. But depending on how the patch file is used (and why it is used) it might be that we could do something else? What that 'something else' is however, I do not know yet.

While one can dump a semantic table for a reference .entry.json file, other perfectly valid .entry.json files might have more of a given thing (for example, more or fewer extra files), so the range of a given type of JSON element at a given level might be different than what we could for the reference .entry.json file. So the table is patched to account for those valid differences.

Curious. Okay. So then I have to figure out how the code (semantics checks) then reference the patch file. I know some files are #included but whether that's it or not I do not know at this time. I suppose that's one way to go about it though.

Even so it'd be nice if we did not have to worry about patching OR if we could come up with a way to not require modifying it manually AND/OR a way (perhaps the most ideal?) to generate it more easily (for we do generate it but it might be nice if a new table could be generated more easily).

I'm afraid I have to leave for now but I'll reply tomorrow if I don't get a chance to today (which is most likely the case and certainly is the case after the hour).

lcn2 commented 1 month ago

Even so it'd be nice if we did not have to worry about patching OR if we could come up with a way to not require modifying it manually AND/OR a way (perhaps the most ideal?) to generate it more easily (for we do generate it but it might be nice if a new table could be generated more easily).

The patching system was, we admit, a but of a "bad hack" that might be showing its limits when the code has to be extended to checks for .entry.json and related author/author_handle.json file(s). If you want to toss out that patching system and come up with a better way, PLEASE DO.

UPDATE 0

If the result is that the requirements in GH-issuecomment-2269551230 are met, AND the tool is able to still do sanity checks for the submission .info.json and .auth.json files (in the 2 arg mode), and you toss out the while patch system for something better, that is fine.

xexyl commented 1 month ago

Even so it'd be nice if we did not have to worry about patching OR if we could come up with a way to not require modifying it manually AND/OR a way (perhaps the most ideal?) to generate it more easily (for we do generate it but it might be nice if a new table could be generated more easily).

The patching system was, we admit, a but of a "bad hack" that might be showing its limits when the code has to be extended to checks for .entry.json and related author/author_handle.json file(s). If you want to toss out that patching system and come up with a better way, PLEASE DO.

I would certainly LIKE to but I don't know how (yet?).

You suggested the other day you might have some thoughts? Perhaps if you say what they are we could discuss them and together come up with the best (or at least a much better) solution.

I will have a look at the man page too but not until tomorrow morning most likely. Perhaps that will also give me some ideas about it.

Of course maybe you were only suggesting that it should be improved upon but you don't have any ideas yet?

Once I have more time I will take a deeper look into the system and maybe that will help with discussion instead.

Perhaps in the meantime we can both ponder the situation and then bring up possibilities when we come up with any? I can start that tomorrow.

Either way I am happy to proceed in a way you like but those are my current thoughts - away from the code and unable to dive into it which of course limits the ability to come up with alternatives (the idea earlier I was still waking up and only thought of briefly in the shower - briefly because my showers are very quick).

But perhaps the best way to come up with an alternative, for the time being, that we both ponder the situation individually.

Going afk now. Back tomorrow (there's a very tiny chance I might have a bit of time to reply to more today but I would not count on it so with that I bid you a good day!).

lcn2 commented 1 month ago

Of course maybe you were only suggesting that it should be improved upon but you don't have any ideas yet?

Yes.

xexyl commented 1 month ago

Of course maybe you were only suggesting that it should be improved upon but you don't have any ideas yet?

Yes.

Ah. Okay then let us both ponder it for a bit and perhaps we can come up with something much better.

I will start looking at it more tomorrow.

lcn2 commented 1 month ago

Of course maybe you were only suggesting that it should be improved upon but you don't have any ideas yet?

Yes.

Ah. Okay then let us both ponder it for a bit and perhaps we can come up with something much better.

I will start looking at it more tomorrow.

We can start with what we need, see GH-issuecomment-2269551230 for a start. Come up with a better way to describe what is expected in the associated JSON files (min and max number, data types, sanity checks on values of the proper types, etc.) and code it! Q.E.D. :-)

xexyl commented 1 month ago

Of course maybe you were only suggesting that it should be improved upon but you don't have any ideas yet?

Yes.

Ah. Okay then let us both ponder it for a bit and perhaps we can come up with something much better. I will start looking at it more tomorrow.

We can start with what we need, see GH-issuecomment-2269551230 for a start. Come up with a better way to describe what is expected in the associated JSON files (min and max number, data types, sanity checks on values of the proper types, etc.) and code it! Q.E.D. :-)

I will start with that comment tomorrow and then reread the man page and then dive into the code more. Hopefully I can also work a bit on the next year html files.

xexyl commented 1 month ago

I've spent some time looking at the man pages and the patch file (or one of them) and also the comment (which I'll reply to in a bit or later on this morning I'd guess). But now for the talk about the patching.

So the patch part is to modify the default table values. The question is how can we get rid of having to do that? I think the question is to determine how the patch file is used. By that I mean: how are they used as part of a valid table? The construction of those tables are complex but I see that for example the auth patch file creates the file soup/chk_sem_auth.c.

Now due to this patch file we end up having the line:

{ 4,    JTYPE_MEMBER,   1,      5,      5,      15,     7,      chk_alt_url,    "alt_url" },

instead of

{ 4, JTYPE_MEMBER,   1,      5,      5,      5,      7,      chk_alt_url,    "alt_url" },

which modifies the index.

I am not clear on the purpose of the index however.

Trying to find something else that's not the index that is modified.

Ah, here. But it's not a member with a name.

{ 5, JTYPE_STRING,   1,      104,    104,    1,      0,      NULL,   NULL },

is changed to:

  { 5,  JTYPE_STRING,   15,     104,    104,    1,      0,      NULL,   NULL },

which means the min is 15, not 1.

The min is minimum allowed count (I guess that AT LEAST that many are required?) but since it's not named I'm not sure what it indicates.

Do you have an example where there is a difference for a NAMED member? And what are the unnamed members if you have an example?

It might be good if we can somehow connect (in discussion) the reference json file to the C files. As in: what tool takes the json file and forms the C file (patch or otherwise)? I don't know if I ever saw that but it might be helpful to know.

I must leave for a bit but I do have a bit of a better idea now what the patching is about, even if I don't fully understand how it all works together yet. I might be an hour or so but it might be longer or it could be less. This morning I will be away a number of times but then later in the morning and into the afternoon I'll have more time (today only). I can, depending on how I feel, either look more at this (or ponder it) or else the website but I hope to get more done than I have been able to recently.

Hopefully this comment can get a discussion going too. As for the other comment I'll have to address that later on.

UPDATE 0

Okay it looks like jsemcgen.sh is what forms the files from patch but I'm still not sure how they are linked into the validation tools like chkentry. I don't see that in the Makefile right now (maybe I will when I'm back).

UPDATE 1

Okay and all_sem_ref.sh too. Back later ..

xexyl commented 1 month ago

Hmm .. would having an identifier for the different tables be something to consider instead of having to patch files?

Don't have time to elaborate yet but should be able to later (with some thinking about it).

xexyl commented 1 month ago

Okay I have some things I do have to take care of (and after that I might look at the website - not sure but I hope to work on something here anyway) but this is a bit more of the idea above with ids. I'm too unfocused to write any code so a description will have to do for now.

Let's say that we have a list or array of some kind struct which has the struct json_sem (I think it is) but which also has a name. The array would be of that struct. Now a name would be compared and the name would allow us to know what file is being considered (for modified default values). Then when processing the file we would have another variable passed to the function(s) and those functions would then use that name to know which struct data to use. Or it might be that the struct (the json_sem I mean) is located prior to calling the functions (maybe that would be better as then one would not have to update all the functions?) and go from there. I think the latter way is probably better assuming that we can figure out a way to do the other parts.

I can see about coming up with some code that does this if you like. Though it would be helpful if you remind me of a file that has an example of the dynamic array facility so I can figure out how it works (so that I can go through the list).

Of course it would possibly require more than the above but it would be a lot better (I think - as long as things are worked out) than having to update a patch file.

Okay but then the first obvious question is: how do we associate the correct struct (json_sem) array with the correct file? Well we could have that in the list/array too (not the struct json_sem array but the one that holds the array of that) - maybe? This would mean we would need more files, maybe, but having more JSON files seems like worth it if it makes it easier to maintain.

Other questions would have to be thought of and answered too but hopefully the above gives you some thoughts too. It might be that you can think of a reason this won't work and that's also okay but it popped into my head and it seems like it might be a better system - as long as everything can be worked out.

Will go take care of those other things and then try and look at the website some. Oh and maybe reply to your earlier comment.