ioccc-src / mkiocccentry

Form an IOCCC submission as a compressed tarball file
Other
28 stars 6 forks source link

Enhancement: create the chkentry tool #259

Closed xexyl closed 1 year ago

xexyl commented 2 years ago

We can discuss the details here. If you've not responded to the question on . in the other thread before I get a chance to (too tired right now) I will copy and paste my reply here to discuss it here. Otherwise I'll move my reply to your new reply here.

Please add the tags you want and also assign it to me. You can then close the other issues and refer to this one here as you suggested.

I don't know when I can start working on this but the first part will be writing main() and a sanity check function. This most likely will not happen today I'm afraid. Not even sure if I'll get anything done today here. I'm hoping I can get some rest soon but last time I tried earlier this morning I couldn't.

Later on I will wake up more but whether I do more here is yet to be seen. Anyway we can now discuss the new tool here which should also help remove some clutter elsewhere (or prevent further clutter I guess).

xexyl commented 2 years ago

Moving the message now then will be afk a while.

True though I think some of this should be kept. It will have to be changed some and in particular the struct json trees will need a function or two to convert to struct json_field and struct json_value but this would mean the check functions could remain the same. Of course it might be that something else could be done instead but it seems like keeping that pair of structs would be useful. Do you have any other ideas?

Think about how to walk the JSON parse tree once the given file has been parsed.

The code will need to verify that the proper JSON parse nodes are found (although the tree could be sorted in any order). Object if unexpected nodes are found, or found a the wrong tree level, wrong count, bad values, Etc.

Think of a single static table for each type of file. As both .info.json and .author.json are fixed in form, there is no need to load such a table on the fly. A static table for both are all that is needed.

Such a table would be an array of structures, each would have things like:

  • Node type
  • Tree level
  • Object name In the case of JTYPE_OBJECT, NULL otherwise
  • Pointer to function returning bool, to validate object value In the case of JTYPE_OBJECT, NULL otherwise
  • Minimum required count Most will be 1, some optional things will be 0
  • Maximum allowed count Most will be 1, 0 can used to indicate no maximum limit
  • count of the times found Starts off as 0.

Walk the parse tree (use the json_tree_walk() function). For each node, search the table. When the table entry is found, increment the count. If a parse tree node is NOT found in the table, then flag as an error. For those of JTTYPE_OBJECT, if the "function to validate object value" returns false, flag as an error.

After walking the entire JSON parse tree, look at number found for each entry. If the number found is out if range (compared with minimum required or maximum allowed), flag as an error.

Something like that.

Much of this already exists in the other structs. That’s why I was thinking of walking the tree (or maybe we should change walk to climb - I did suggest that before but it might be more confusing since most people would not think in those terms as the actual term is walk) and creating a linked list of the old structs.

It has what you suggest (or almost all of it) and it would make it easier to do. However I suppose something new could also do this. But the points you gave I thought if in the first place which is why they’re there.

The difference would be that instead of a parser specific to each file (so not generic like we’re working on - and again I consider it a huge honour to be working on this with you so thank you very much! 🙏) it would use the generic parser but the check code would not have to change that much.

But some of it could change and I think no matter what some of the check related code will change and as you say be better for it.

What do you think?

xexyl commented 2 years ago

As for using . to signify ignoring the arg: what if it's only one arg specified? Should we assume that the files are in the cwd and not change directories? Should maybe the character be something else? I don't know what. I'd suggest - but that would require -- all the time so that's not really useful I think.

What do you think?

Resting again. Good day!

lcn2 commented 2 years ago

We moved this comment over to this issue ( #259 )

The only questions to answer before we do that:

  • What about a string?

That is not needed for chkentry. Parsing JSON strings is for tools such as jparse ... -s foo.

  • What about stdin?

That is not needed for chkentry: this tool is for checking an IOCCC entry that is stilling in a directory.

BTW: The 2 arg option for chkentry is a convenience when one wishes to copy some strange .info.json file to a different location, make a vim edit and return the check. The convenience is not having to duplicate the entire directory in order to just hand edit a strange .info.json file, for example.

For a string the file could be NULL and for stdin it could be the typical -. But do we want access to it in the parser? I could see how it would be useful and it would be useful in the error function as well. Want me to add that at least?

See above.

lcn2 commented 2 years ago

Much of this already exists in the other structs. That’s why I was thinking of walking the tree (or maybe we should change walk to climb - I did suggest that before but it might be more confusing since most people would not think in those terms as the actual term is walk) and creating a linked list of the old structs.

...

What do you think?

We don't see that struct json_value and struct json_field are what is needed.

They were a good idea back before the need for a full JSON parser with a JSON parse tree was needed.

Consider the difference between:

{
    "no-comment" : "mandatory JSON parsing directive",
    "IOCCC_info_version" : "1.9 2022-03-15",
...
}

and:

{{
    "no-comment" : "mandatory JSON parsing directive",
    "IOCCC_info_version" : "1.9 2022-03-15",
...
}}

and:

[{
    "no-comment" : "mandatory JSON parsing directive",
    "IOCCC_info_version" : "1.9 2022-03-15",
...
}]

Consider the difference between:

{
...
    "test_mode" : false,
    "manifest" : [
        {"info_JSON" : ".info.json"},
...
    ],
    "formed_timestamp" : 1655744278,
...
}

and:

{
...
    "manifest" : [
        {"info_JSON" : ".info.json", "test_mode" : false}
...
    ],
    "formed_timestamp" : 1655744278,
...
}
lcn2 commented 2 years ago

It has what you suggest (or almost all of it) and it would make it easier to do. However I suppose something new could also do this. But the points you gave I thought if in the first place which is why they’re there.

The difference would be that instead of a parser specific to each file (so not generic like we’re working on - and again I consider it a huge honour to be working on this with you so thank you very much! 🙏) it would use the generic parser but the check code would not have to change that much.

But some of it could change and I think no matter what some of the check related code will change and as you say be better for it.

As you know, the big difference between jparse and a took such as chkentry is fundamental.

The jparse tool checks for syntax and grammar specific to the JSON spec.

The chkentry tool, on the other hand, is a semantically oriented tool. Moreover, it is a tool that is very much tied into the semantics of the IOCCC itself.

For the chkentry tool, successful JSON parsing is a mandatory condition before it can even start to work: much like a machine code generator cannot start to do useful with until the C compiler has verified the syntax and grammar of the C language.

We know you are very aware of this, @xexyl. We are simply restating this of the record. :-)

So chkentry calls the parse_json() function and receive the JSON parse tree. If is_valid is false, chkentry can do nothing more that is useful.

This is_valid is false case should be a rare exception for the IOCCC. An IOCCC judge drop a submission with a invalid .author.json file: returning an error to the user that would simply state "The .author.json file is invalid". No detailed JSON error message as to what is wrong will be given. In fact, to hide the authorship, we would NEVER look into a .author.json except if the entry WON the IOCCC, let alone quote some JSON parser error that might reveal the authorship, or Email address, or ... etc.

What chkentry needs to tell us, in a true or false way, is IF the JSON files in the entry directory are valid JSON, and if the semantics of those files are OK: such as the manifest is OK, size is in range, etc. If the semantics both files is OK, all we need is to see true (zero exit code) value. If not, we need a false (exit 1 code).

Now chkentry, for debugging purposes should have a -v level and -J level options. However those will NOT be used while running a real IOCCC.

We have a meeting that is about to start. We will write more about an idea of HOW to make chkentry static table driven, how to generate the two tables for .info.json and .author.json, and what parts of the old chkauth and chkinfo code might be able to be reused. ... after we finish our meetings and do some domestic chores :)

lcn2 commented 2 years ago

Returning to a table driven chkentry, revised:

Let us call the following a description of a semantic table:

If there is a generic function for semantic JSON processing (post JSON parsing), it would be to compare a JSON parse tree with a pointer to a semantic table (array) of the above.

The generic code would walk the JSON parse (via the json_tree_walk() interface) while scanning the semantic table (passed to the function as a pointer). When this code find a JSON parse node that does NOT have a counterpart in the semantic table, this means there is a valid JSON syntax that has invalid semantics (as defiled by the table). In particular this means that a valid JSON element was found that does not belong in the JSON file. Where there is a semantic table match, the "count of the times found" in incremented by one. If that semantic table entry has a non-NULL "Pointer to function returning bool, to validate object value", then that function is called with the pointer to the given JSON parse node. If that function returns true, then all is well. If that function returns false then this means that while the JSON node has a problem, semantics-wise.

The "JSON node has a problem, semantics-wise" could be something such as the numeric value is out of range, or the string contains a invalid character, etc.

As far as the generic case is concerned, it is a set of functions that given a JSON parse tree and a semantic table,

For a specific case such as chkentry, there will need to be some non-generic functions that are part of those "Pointer to function returning bool, to validate object value" functions.

BTW: It turns out that a number of those chkentry are embedded into static functions within mkiocccentry.c. For example within the static function get_entry_num() is where mkiocccentry checks if the entry number is in range. That check will need to be moved to a function that both chkentry and mkiocccentry can share.

... more to come ...

lcn2 commented 2 years ago

Looking over the chk* files, we don't see very much that will work with the table driven semantic checks.

We propose to deprecate the following files:

By deprecate we mean that the files would be renamed so that the content be consulted with, but that the Makefile no longer compiles them. There are a few concepts that need to be kept, and of course the man pages need to be folded into a new chkentry man page. Such files will become inactive, but not forgotten for now.

... more to come ...

lcn2 commented 2 years ago

Once the JSON parser is code complete, we propose to add a flag to jparse that would, if the JSON is valid, generate an initial semantic table for that code.

This would make it very easy for the semantic tables for .info.json and .author.json to be initially created. One would do this only once (or if the file format is fundamentally changed). This mode would start off with the "Pointer to function returning bool, to validate object value" being NULL. That is a special case that the code doing the semantic work (such as chkentry) would need to write. Some of the minimum and maximum counts would need to be adjusted too. Nevertheless it would be a better stating place than trying to hand count a JSON parse tree.

lcn2 commented 2 years ago

To better explain, we could create the semantic structure and modify jparse to generate the initial values, once the JSON object parsing is ready. At that time, we would also perform the changes as suggested in comment 1162529524.

UDAPTE 0:

Back to coding the JSON object parsing ...

xexyl commented 2 years ago

Still waking up. Cannot sleep but not quite awake as much as I'd like given that. Hopefully I can go back to sleep in a bit though that's unlikely. Anyway I'll reply to these out of order but I'll get to them all in time.

Looking over the chk* files, we don't see very much that will work with the table driven semantic checks.

We propose to deprecate the following files:

  • chk_entry.c
  • chk_entry.h
  • chk_err.codes
  • chk_util.c
  • chk_util.h
  • chk_warn.codes
  • chkauth.1
  • chkauth.c
  • chkauth.h
  • chkcode.sh
  • chkinfo.1
  • chkinfo.c
  • chkinfo.h

By deprecate we mean that the files would be renamed so that the content be consulted with, but that the Makefile no longer compiles them. There are a few concepts that need to be kept, and of course the man pages need to be folded into a new chkentry man page. Such files will become inactive, but not forgotten for now.

Well the *.codes file of course are not compiled but I still think they're needed for the reasons I cited in the other thread several days back. But the messages are likely to change and probably quite a bit. I also think the check script will be needed for the same reasons I cited before.

The check functions are very much needed though if you think that those two structs I have referred to are not going to cut it they will have to change. I still think they might have some value (I'd say json_value for the pun but I'm not sure json has much value :) ) but I'll go through what you have here before I worry about that. It might be that something new can be devised based on them instead.

... more to come ...

Sure.

xexyl commented 2 years ago

To better explain, we could create the semantic structure and modify jparse to generate the initial values, once the JSON object parsing is ready. At that time, we would also perform the changes as suggested in comment 1162529524.

Sure. Will hold off then.

UDAPTE 0:

Back to coding the JSON object parsing ...

Sounds good. Hope you're having a nice sleep my friend!

xexyl commented 2 years ago

Once the JSON parser is code complete, we propose to add a flag to jparse that would, if the JSON is valid, generate an initial semantic table for that code.

This would make it very easy for the semantic tables for .info.json and .author.json to be initially created. One would do this only once (or if the file format is fundamentally changed). This mode would start off with the "Pointer to function returning bool, to validate object value" being NULL. That is a special case that the code doing the semantic work (such as chkentry) would need to write. Some of the minimum and maximum counts would need to be adjusted too. Nevertheless it would be a better stating place than trying to hand count a JSON parse tree.

Sure. Maybe that'll help give me some ideas after I see what you have in mind. I'm sure it'll work and in any event it will be better whatever the case - as you suggested.

I have several times (due to thinking we'd have to change the two old tools into one) thought of removing from mkiocccentry the calls to those tools. Maybe when I add the initial chkentry I will update mkiocccentry to run that - with a guaranteed (for the time) exit status 0 so it still functions okay.

I doubt I can do that today: I got less sleep last night and I have some things going on later in the morning. But we'll see.

A thought: maybe we should discuss the chkentry syntax? Another way to go about it is if I start working on it I can show what I have come up with first and then we can go from there. I guess that will work well enough.

xexyl commented 2 years ago

Will reply to the others later .... will delete this comment once I've done that. Those other comments are too detailed for my sleepy eyes to focus on yet.

lcn2 commented 2 years ago

Still waking up. Cannot sleep but not quite awake as much as I'd like given that. Hopefully I can go back to sleep in a bit though that's unlikely. Anyway I'll reply to these out of order but I'll get to them all in time.

Looking over the chk* files, we don't see very much that will work with the table driven semantic checks. We propose to deprecate the following files:

  • chk_entry.c
  • chk_entry.h
  • chk_err.codes
  • chk_util.c
  • chk_util.h
  • chk_warn.codes
  • chkauth.1
  • chkauth.c
  • chkauth.h
  • chkcode.sh
  • chkinfo.1
  • chkinfo.c
  • chkinfo.h

By deprecate we mean that the files would be renamed so that the content be consulted with, but that the Makefile no longer compiles them. There are a few concepts that need to be kept, and of course the man pages need to be folded into a new chkentry man page. Such files will become inactive, but not forgotten for now.

Well the *.codes file of course are not compiled but I still think they're needed for the reasons I cited in the other thread several days back. But the messages are likely to change and probably quite a bit. I also think the check script will be needed for the same reasons I cited before.

The check functions are very much needed though if you think that those two structs I have referred to are not going to cut it they will have to change. I still think they might have some value (I'd say json_value for the pun but I'm not sure json has much value :) ) but I'll go through what you have here before I worry about that. It might be that something new can be devised based on them instead.

... more to come ...

Sure.

The deprecation of code and a stub for chkentry, along with all of the required changes has performed in commit 8aefacfb2b51a92bdb8566377c6246d5c08c1500.

xexyl commented 2 years ago

Still waking up. Cannot sleep but not quite awake as much as I'd like given that. Hopefully I can go back to sleep in a bit though that's unlikely. Anyway I'll reply to these out of order but I'll get to them all in time.

Looking over the chk* files, we don't see very much that will work with the table driven semantic checks. We propose to deprecate the following files:

  • chk_entry.c
  • chk_entry.h
  • chk_err.codes
  • chk_util.c
  • chk_util.h
  • chk_warn.codes
  • chkauth.1
  • chkauth.c
  • chkauth.h
  • chkcode.sh
  • chkinfo.1
  • chkinfo.c
  • chkinfo.h

By deprecate we mean that the files would be renamed so that the content be consulted with, but that the Makefile no longer compiles them. There are a few concepts that need to be kept, and of course the man pages need to be folded into a new chkentry man page. Such files will become inactive, but not forgotten for now.

Well the *.codes file of course are not compiled but I still think they're needed for the reasons I cited in the other thread several days back. But the messages are likely to change and probably quite a bit. I also think the check script will be needed for the same reasons I cited before. The check functions are very much needed though if you think that those two structs I have referred to are not going to cut it they will have to change. I still think they might have some value (I'd say json_value for the pun but I'm not sure json has much value :) ) but I'll go through what you have here before I worry about that. It might be that something new can be devised based on them instead.

... more to come ...

Sure.

The deprecation of code and a stub for chkentry, along with all of the required changes has performed in commit 8aefacf.

As for this I do have a concern about

chk_err.codes => old.chk_err.codes                                                                |   0
 chk_test.sh => old.chk_test.sh                                                                    |   0
 chk_warn.codes => old.chk_warn.codes                                                              |   0
 chkcode.sh => old.chkcode.sh                                                                      |   0

Because those are specifically to make sure that the check util fails for the correct reason. It's fine and well to have a specific bad file to fail but if it's not failing for the correct reason it's not a valid test result. That was the purpose of these scripts if you recall. As for the *.codes file these were important for documentation purposes of warnings and errors in the check util. These might not be needed later on but right now I think they are needed. However they will definitely change a lot - messages and function names and other things.

I have some other things on my mind that I'm going to address and then trying to rest again. I don't know if I'll get to the other comments today but hopefully if I don't today then I can tomorrow. Hope you're having a nice sleep my friend!

I do have a typo fix in the sorry file but I'll worry about that later. Anyway off to do other things now.

xexyl commented 2 years ago

I might have a problem with some of the other files being deleted too. Or I should say some of the code can be used in the new tool. Thus I would request that for now the files are not deleted. In a commit in part of the pull request I just made I refer to some code that could be copied (with some changes) to prevent having to rewrite code that won't be improved any (calls to strcmp() etc.).

lcn2 commented 2 years ago

I might have a problem with some of the other files being deleted too. Or I should say some of the code can be used in the new tool. Thus I would request that for now the files are not deleted. In a commit in part of the pull request I just made I refer to some code that could be copied (with some changes) to prevent having to rewrite code that won't be improved any (calls to strcmp() etc.).

Sure.

xexyl commented 2 years ago

I might have a problem with some of the other files being deleted too. Or I should say some of the code can be used in the new tool. Thus I would request that for now the files are not deleted. In a commit in part of the pull request I just made I refer to some code that could be copied (with some changes) to prevent having to rewrite code that won't be improved any (calls to strcmp() etc.).

Sure.

Thank you. Should I wait on the table thing you were talking about before I work on this util?

lcn2 commented 2 years ago

I might have a problem with some of the other files being deleted too. Or I should say some of the code can be used in the new tool. Thus I would request that for now the files are not deleted. In a commit in part of the pull request I just made I refer to some code that could be copied (with some changes) to prevent having to rewrite code that won't be improved any (calls to strcmp() etc.).

Sure.

Thank you. Should I wait on the table thing you were talking about before I work on this util?

Yes, please. Still working on the semantics architecture.

xexyl commented 2 years ago

I might have a problem with some of the other files being deleted too. Or I should say some of the code can be used in the new tool. Thus I would request that for now the files are not deleted. In a commit in part of the pull request I just made I refer to some code that could be copied (with some changes) to prevent having to rewrite code that won't be improved any (calls to strcmp() etc.).

Sure.

Thank you. Should I wait on the table thing you were talking about before I work on this util?

Yes, please. Still working on the semantics architecture.

That works for me. Let me know and then we can discuss things more - including the above comments as well.

xexyl commented 2 years ago

Just a couple things I noticed. One of these applies to other tools too but since it's in chkentry.c I will ask here. The other one applies to just chkentry.

I noticed that it prints the version string via the print() macro. Should the other tools use this (which I see uses pr() which will call warn() and warnp() on errors)? I guess it's not a high priority but is this the preferred way from now on? I mean it does seem it would make it quicker to deal with.

As for the more specific thing I noticed the usage message when wrong number of args:

        usage(4, program, "expected %d or %d+1 arguments, found: %d", REQUIRED_ARGS, arg_cnt); /*ooo*/

I found that strange as not the same amount of arguments as the %- specifiers. I then looked at the usage() function and it seems different from the others. It uses some kind of magic (note: I'm not looking in detail .. I need to do something else soon so maybe it's not really magic but it's not like the other usage functions in other tools). Anyway it prints a string in the form of:

expected 1 or 2+1 arguments, found: 0

But I can't help but think it should just say 1 or 3 arguments. Wouldn't that be easier? Or is this in case for some reason the syntax changes the number of arguments required? Even so I think it would be better if we change the function so it prints out the sum rather than an expression string.

And with that I'm going to do something else now. Good day my friend!

lcn2 commented 2 years ago

I noticed that it prints the version string via the print() macro. Should the other tools use this (which I see uses pr() which will call warn() and warnp() on errors)? I guess it's not a high priority but is this the preferred way from now on? I mean it does seem it would make it quicker to deal with.

Sure. This will simplify the -V node as there will be no need to set errno, not check if the printf() call returns an error.

There are 3 cases: in chkentry.c, dyn_test.c and verge.c where errno is cleared before calling print() that should be removed.

Not a high priority, but fi you are looking for something to do: such a cleanup would be good.

lcn2 commented 2 years ago

As for the more specific thing I noticed the usage message when wrong number of args:

        usage(4, program, "expected %d or %d+1 arguments, found: %d", REQUIRED_ARGS, arg_cnt); /*ooo*/

I found that strange as not the same amount of arguments as the %- specifiers. I then looked at the usage() function and it seems different from the others. It uses some kind of magic (note: I'm not looking in detail .. I need to do something else soon so maybe it's not really magic but it's not like the other usage functions in other tools). Anyway it prints a string in the form of:

expected 1 or 2+1 arguments, found: 0

But I can't help but think it should just say 1 or 3 arguments. Wouldn't that be easier? Or is this in case for some reason the syntax changes the number of arguments required? Even so I think it would be better if we change the function so it prints out the sum rather than an expression string.

That seems to be a bug, or at least a mis-feature that should be cleaned up, if you are willing to fix it.

And with that I'm going to do something else now. Good day my friend!

We are off to make dinner for our 4th of July festivities, so best wishes for now.

xexyl commented 2 years ago

I noticed that it prints the version string via the print() macro. Should the other tools use this (which I see uses pr() which will call warn() and warnp() on errors)? I guess it's not a high priority but is this the preferred way from now on? I mean it does seem it would make it quicker to deal with.

Sure. This will simplify the -V node as there will be no need to set errno, not check if the printf() call returns an error.

That's true. It would simplify it.

There are 3 cases: in chkentry.c, dyn_test.c and verge.c where errno is cleared before calling print() that should be removed.

Could do that yes.

Not a high priority, but fi you are looking for something to do: such a cleanup would be good.

True. And other tools could make use of that macro instead as well. Maybe something to do when I feel like working on something but don't have much energy to do anything else.

xexyl commented 2 years ago

As for the more specific thing I noticed the usage message when wrong number of args:

        usage(4, program, "expected %d or %d+1 arguments, found: %d", REQUIRED_ARGS, arg_cnt); /*ooo*/

I found that strange as not the same amount of arguments as the %- specifiers. I then looked at the usage() function and it seems different from the others. It uses some kind of magic (note: I'm not looking in detail .. I need to do something else soon so maybe it's not really magic but it's not like the other usage functions in other tools). Anyway it prints a string in the form of:

expected 1 or 2+1 arguments, found: 0

But I can't help but think it should just say 1 or 3 arguments. Wouldn't that be easier? Or is this in case for some reason the syntax changes the number of arguments required? Even so I think it would be better if we change the function so it prints out the sum rather than an expression string.

That seems to be a bug, or at least a mis-feature that should be cleaned up, if you are willing to fix it.

I'll do that when I start working on the tool itself. Seems that's the right time to do it. Maybe I'll do it before then but we'll see.

And with that I'm going to do something else now. Good day my friend!

We are off to make dinner for our 4th of July festivities, so best wishes for now.

Enjoy and good night. Stay safe. Will be getting ready to sleep in a couple hours or so probably. Good night!

lcn2 commented 2 years ago

We are waiting until issue #156 is resolved by bringing the JSON semantics code into a code complete state, before returning to this issue.

As we shift to primarily focusing developing chklentry, we will need to make changes to the JSON semantics code and perhaps adjustments around the edges of the JSON parser itself. That would suggest the resolving of issue #156 won't be the time to split off the JSON parser into its own repo: instead at a minimum, let the chklentry complete and this issue #259 be finished at a minimum before that happens.

xexyl commented 2 years ago

We are waiting until issue #156 is resolved by bringing the JSON semantics code into a code complete state, before returning to this issue.

As we shift to primarily focusing developing chklentry, we will need to make changes to the JSON semantics code and perhaps adjustments around the edges of the JSON parser itself. That would suggest the resolving of issue #156 won't be the time to split off the JSON parser into its own repo: instead at a minimum, let the chklentry complete and this issue #259 be finished at a minimum before that happens.

That seems reasonable. I still would like to work on the chkentry tool of course. Code I wrote for the previous tools will be of use here as well. Although traversing the tree will be different the checks are still there which can be reused.

I will probably not reply to anything else today but worry about the replying tomorrow. As for another repo I won't worry about that until at least the mock contest is finished.

lcn2 commented 2 years ago

With commit 9bda575b6bbc1563b47856b2a8228815045ea92f the 1st draft .author.json & .info.json semantics tables that will eventually be used by chkentry.

TODO: rest.

TODO: Write the JSON validation functions. Extract the code check code from mkiocccentry (and friends) and place in a common code base such as entry_util.c and entry_util.h.

TODO: write code that compares a JSON parse tree with with the associated semantics table.

xexyl commented 2 years ago

With commit 9bda575 the 1st draft .author.json & .info.json semantics tables that will eventually be used by chkentry.

I see it. More than I'm able to parse right now though I'm afraid. I'm much too tired. I'm sleeping more (which is good) but then I feel very fatigued (not related to sleeping more).

TODO: rest.

I should hope so given the hour you wrote the comment!

TODO: Write the JSON validation functions. Extract the code check code from mkiocccentry (and friends) and place in a common code base such as entry_util.c and entry_util.h.

Sounds good.

TODO: write code that compares a JSON parse tree with with the associated semantics table.

Sounds good too.

xexyl commented 2 years ago

Re commit 7f7760fca96c2bc38527039d5aa58b24de549096:

I see you've started work on testing the values. I would like to remind you that for a lot of the tests you should be able to fairly easily adapt what I did already which are currently in check_found_info_json_fields(), check_found_common_json_fields() and check_found_author_json_fields().

I would be happy to do some of these too once you have things in a way that you like it. I think in any case the header of the chkentry files (h, c) should be updated in this way. I will maybe do that later on today or the next day.

I am afraid I don't see myself doing much of anything today though. Too tired and not feeling very well. But I did want to remind you of these checks as a lot of the code (the checks themselves) are already written. It's just instead of using the old structs and linked lists (and the name and value members) you can use the specific functions with the correct values.

More specifically: the tests in those functions could be split into individual functions. Would you like me to do that? I would be happy to do that and I think I should be able to do it fairly easily too - once I have an idea of what you're after. Does that sound good?

xexyl commented 2 years ago

As for the Easter egg.. we already have established how to activate it. Here are some possible ideas though of what it might do. It might be that it does other things of course but these are the things that popped into my head a while back that I referred to but never said.

I had three thoughts all of them being messages (well what else could there be? I guess there could be other things though). Three different types.

  1. Funny quotes. The (in?)famous 'I don't know half of you half as ..' (or the funnier ones in HoMe) is the first one I thought of but maybe that's out of scope. It could be funny programming quotes or a combination of different types.
  2. Since it's the IOCCC it could highlight in a pseudo-random (or round robin or some other order or 'order') different entries. It might just be the summary file. I'm not sure. At first I had the idea that it might actually print out the source code of some but this I felt might be too alarming for some users.
  3. It could also have bits of information about Unix history or C history etc. Some of these might be some that you told me but I'm sure there are many others in your mind (Hint, hint! :)).

It could be a combination of these (as said). Here's a fun C trivia one I have always liked: how += used to be =+ (and I guess the same for the others). I guess not everyone knows that.

Now as for how to decide which ones to print I had some ideas.

  1. It could be pseudo-randomly selected from a table (which I guess is the safest way to have these things - so that one does not have to read in a file).
  2. It could be that it takes the time since the Unix epoch and scale it so that it fits in the range (thinking secs % TABLE_SIZE or whatever) ... which might not even be valid. It just popped into my head as an idea. I haven't really thought through it.
  3. It could be something else.

As for 'it could be something else': these ideas could also be something else but they're some ideas that popped into my head. I think the last one is maybe the most fun and entertaining (for the IOCCC at least?) but maybe the second one is too. The first one could be good for a lot of people too who need laughs. I have a lot of my own funny quotes (I'm not suggesting I fill the table with these! :) ) which I have shared some with you (which you laughed at) but there are others too of course who have.

Now going afk. If I can I'll add more test functions in a bit (I did add one as you'll see in commit https://github.com/ioccc-src/mkiocccentry/pull/288/commits/a685110bab2e1f73ec8cb65e1fad1261a58f4356).

xexyl commented 2 years ago

With commit https://github.com/ioccc-src/mkiocccentry/pull/288/commits/be3c9c39243b18241df26f09f51bc8058cef468a I believe all the version test functions have been added. Again without the functions that call the functions. I'm just going through the items quickly and adding the test functions. I plan to work on it a bit more. I'm not sure if I'll end up adding all of them.

And yes of course I chose the above cited commit to do txzchk specifically for the comment - rather than with the others.

xexyl commented 2 years ago

As of commit https://github.com/ioccc-src/mkiocccentry/pull/288/commits/ed7db992089ea8443315996d21be1cc2e465055c several new test functions have been added. This will all I can be for now as I'm having a hard time focusing my eyes and history tells me this is a bad time to be doing - well anything that takes even a little bit of focus.

However I did notice a couple things when adding these functions.

First: do we want the file name reported? If so the functions will have to be updated. I originally did have the file names reported.

Also do we want to keep a counter of the number of issues? That's also something I originally had but which is no longer there.

Anyway I hope these help. I hope to do more tomorrow if not today. As I noted these functions are not called by any but they're ready for this. Probably I won't manage anything else today though. The dbg repo still needs to have the readme finished but I'm way too tired for that one (as I said elsewhere).

Good day!

xexyl commented 2 years ago

Back for a moment. I was considering adding more tests but I need to take it easy the rest of the day I think. However something occurred to me that might also require some changes in how things are done.

There are sanity tests that I had added on top of the regular tests. For example:

    /*  
     * Now we have to do some additional sanity tests like bool mismatches etc.
     * 
     * If Makefile override is set to true and there are no problems found with
     * the Makefile there's a mismatch: check and report if this is the case.
     */ 
    if (info.Makefile_override && info.first_rule_is_all && info.found_all_rule &&
            info.found_clean_rule && info.found_clobber_rule && info.found_try_rule) {
        warn(__func__, "Makefile_override == true but all expected Makefile rules found "
                       "and 'all:' is first in file %s", json_filename);
        ++issues;
    }   

    /*          
     * If info.found_all_rule == false and info.first_rule_is_all == true
     * there's a mismatch: check this and report if this is the case.
     */
    if (!info.found_all_rule && info.first_rule_is_all) {
        warn(__func__, "'all:' rule not found but first_rule_is_all == true in file %s", json_filename);
        ++issues;
    }

    /* if empty_override == true and prog.c is not size 0 there's a problem */
    if (info.empty_override && info.rule_2a_size > 0 && info.rule_2b_size > 0) {
        warn(__func__, "empty_override == true but prog.c size > 0 in file %s", json_filename);
        ++issues;
    }

    /*
     * If empty_override == false and either of rule 2a or rule 2b size == 0
     * there's a problem.
     */
    if (!info.empty_override && (info.rule_2a_size == 0 || info.rule_2b_size == 0)) {
        warn(__func__, "empty_override == false but rule 2a and/or rule 2b size == 0 in file %s", json_filename);
        ++issues;
    }

There were other tests as well but those are some that might require some changes. Or perhaps not: maybe you've thought of them already. I've not looked at it in that detail.

See the functions I referred to in comment https://github.com/ioccc-src/mkiocccentry/issues/259#issuecomment-1198164241.

xexyl commented 2 years ago

Two more tests:

    /*
     * Now that we've checked each field by name, we still have to make sure
     * that each field expected is actually there. Note that in the above loop
     * we already tested if each field has been seen more times than allowed so
     * we don't do that here. This is because the fields that are in the list
     * are those that will potentially have more than allowed whereas here we're
     * making sure every field that is required is actually in the list.
     */
    for (loc = 0; info_json_fields[loc].name != NULL; ++loc) {
        if (!info_json_fields[loc].found && info_json_fields[loc].max_count > 0) {
            warn(__func__, "required field not found in found_info_json_fields list "
                           "in file %s: '%s'", json_filename, info_json_fields[loc].name);
            ++issues;
        }
    }

    /*
     * Check for duplicate files in the manifest.
     *
     * XXX - This should probably be in its own function.
     */
    for (manifest_file = manifest_files_list; manifest_file != NULL; manifest_file = manifest_file->next) {
        if (manifest_file->count > 1) {
            warn(__func__, "found duplicate file (count: %ju) in file %s: '%s'",
                           (uintmax_t)manifest_file->count, json_filename, manifest_file->filename);
            ++issues;
        }
    }

Some of that of course would have to be done very differently but you get the idea from it. There also was a test for each field that it's found no more than the allowed times and also that if it's required it is found.

check_found_common_json_fields() also has some special code for verifying the tarball name matches what it should with various fields. Probably has some other special tests as well.

Anyway doing something else now. Good day!

lcn2 commented 2 years ago

First: do we want the file name reported? If so the functions will have to be updated. I originally did have the file names reported.

Do you think reporting the filenames will be useful? It might be more bother than it is worth.

Also do we want to keep a counter of the number of issues? That's also something I originally had but which is no longer there.

We presume this might be one of those ugly cases where a global value is ++-ed and whose non-zero count is reported art the very last minute.

While such a global error count would prevent parallel threads we don't do parallel so this is not a big loss. :-)

lcn2 commented 2 years ago

Two more tests:

    /*
     * Now that we've checked each field by name, we still have to make sure
     * that each field expected is actually there. Note that in the above loop
     * we already tested if each field has been seen more times than allowed so
     * we don't do that here. This is because the fields that are in the list
     * are those that will potentially have more than allowed whereas here we're
     * making sure every field that is required is actually in the list.
     */
    for (loc = 0; info_json_fields[loc].name != NULL; ++loc) {
        if (!info_json_fields[loc].found && info_json_fields[loc].max_count > 0) {
            warn(__func__, "required field not found in found_info_json_fields list "
                           "in file %s: '%s'", json_filename, info_json_fields[loc].name);
            ++issues;
        }
    }

    /*
     * Check for duplicate files in the manifest.
     *
     * XXX - This should probably be in its own function.
     */
    for (manifest_file = manifest_files_list; manifest_file != NULL; manifest_file = manifest_file->next) {
        if (manifest_file->count > 1) {
            warn(__func__, "found duplicate file (count: %ju) in file %s: '%s'",
                           (uintmax_t)manifest_file->count, json_filename, manifest_file->filename);
            ++issues;
        }
    }

Some of that of course would have to be done very differently but you get the idea from it. There also was a test for each field that it's found no more than the allowed times and also that if it's required it is found.

check_found_common_json_fields() also has some special code for verifying the tarball name matches what it should with various fields. Probably has some other special tests as well.

Anyway doing something else now. Good day!

All that sort of testing is being replaced by the JSON semantic tables and tests that are being written.

xexyl commented 2 years ago

First: do we want the file name reported? If so the functions will have to be updated. I originally did have the file names reported.

Do you think reporting the filenames will be useful? It might be more bother than it is worth.

Well you would know what works for the IOCCC. It perhaps was more useful during initial development and testing which is very different now.

Also do we want to keep a counter of the number of issues? That's also something I originally had but which is no longer there.

We presume this might be one of those ugly cases where a global value is ++-ed and whose non-zero count is reported art the very last minute.

It wasn’t a global variable no. But it’s true that it was used to determine if all tests passed (for the return value).

While such a global error count would prevent parallel threads we don't do parallel so this is not a big loss. :-)

That’s true too of course but it wasn’t a global variable anyway. But it appears it’s not needed so it doesn’t matter for that reason either.

xexyl commented 2 years ago

Two more tests:

    /*
     * Now that we've checked each field by name, we still have to make sure
     * that each field expected is actually there. Note that in the above loop
     * we already tested if each field has been seen more times than allowed so
     * we don't do that here. This is because the fields that are in the list
     * are those that will potentially have more than allowed whereas here we're
     * making sure every field that is required is actually in the list.
     */
    for (loc = 0; info_json_fields[loc].name != NULL; ++loc) {
        if (!info_json_fields[loc].found && info_json_fields[loc].max_count > 0) {
            warn(__func__, "required field not found in found_info_json_fields list "
                           "in file %s: '%s'", json_filename, info_json_fields[loc].name);
            ++issues;
        }
    }

    /*
     * Check for duplicate files in the manifest.
     *
     * XXX - This should probably be in its own function.
     */
    for (manifest_file = manifest_files_list; manifest_file != NULL; manifest_file = manifest_file->next) {
        if (manifest_file->count > 1) {
            warn(__func__, "found duplicate file (count: %ju) in file %s: '%s'",
                           (uintmax_t)manifest_file->count, json_filename, manifest_file->filename);
            ++issues;
        }
    }

Some of that of course would have to be done very differently but you get the idea from it. There also was a test for each field that it's found no more than the allowed times and also that if it's required it is found. check_found_common_json_fields() also has some special code for verifying the tarball name matches what it should with various fields. Probably has some other special tests as well. Anyway doing something else now. Good day!

All that sort of testing is being replaced by the JSON semantic tables and tests that are being written.

Thank you for letting me know about that. I will not worry about that part then.

Hopefully tomorrow I can add more test functions but we shall see. I should say that next week for about three weeks I will very likely be online less time each day. It shouldn’t change how much I am able to do with actual commits but it might mean that I will have less time to reply and discuss things.

That isn’t necessarily true though because if I have the energy I can just rearrange my time in the day so that I can do the computer things at the times I am at the computer.

I will know more later on.

I am just taking it easy a bit more and then I will get ready to sleep. Hope you have a great night!

xexyl commented 2 years ago

As for the Easter egg.. we already have established how to activate it. Here are some possible ideas though of what it might do. It might be that it does other things of course but these are the things that popped into my head a while back that I referred to but never said.

Just to say: I am definitely open to discussing this. I am curious what you would prefer. You might also have some ideas of how to implement some of it.

It seems at least that you got a laugh out of the ideas though which is good.

Okay really going for now. Have a great night!

lcn2 commented 2 years ago

Just to say: I am definitely open to discussing this. I am curious what you would prefer. You might also have some ideas of how to implement some of it.

Your idea 💡 about time based might be fun 🤩. Although rather than the thing changing each second, you might consider the think changing each day.

In the table from which time selects, it might be nice to mildly encode the string so that running strings(1) on the binary doesn't spoil it.

xexyl commented 2 years ago

Just to say: I am definitely open to discussing this. I am curious what you would prefer. You might also have some ideas of how to implement some of it.

Your idea 💡 about time based might be fun 🤩. Although rather than the thing changing each second, you might consider the think changing each day.

So each day has a new message? My only thought there: what happens when more days than the number of items has passed? Would you please elaborate?

In the table from which time selects, it might be nice to mildly encode the string so that running strings(1) on the binary doesn't spoil it.

Funny you should mention that as I was thinking of using rot13 on it. Does that sound reasonable or do you suggest something else? Another thought: some form of XOR encryption but include rot13 as well? Maybe to mislead? But maybe that's going too far: maybe rot13 is enough?

As for discussion do you have a preference of which list of things to include? The funny quotes, the C and Unix history/trivia or the IOCCC entries (which would eventually run out anyway I guess but the same could be said for some of the others in time)?

In any case they're all quotes in a sense so what file should they be in? Do you have a name for the table (or is it more than one table?) and what about function names (should it be bogus names or should it be something more telling like show_quote()) ?

Thank you! Hope you're having a nice sleep! I'm thinking I might look at adding more test functions but I'm not sure how many or if I'll get to it today.

lcn2 commented 2 years ago

Just to say: I am definitely open to discussing this. I am curious what you would prefer. You might also have some ideas of how to implement some of it.

Your idea 💡 about time based might be fun 🤩. Although rather than the thing changing each second, you might consider the think changing each day.

So each day has a new message? My only thought there: what happens when more days than the number of items has passed? Would you please elaborate?

How about a different message based on the week or month?

In the table from which time selects, it might be nice to mildly encode the string so that running strings(1) on the binary doesn't spoil it.

Funny you should mention that as I was thinking of using rot13 on it. Does that sound reasonable or do you suggest something else? Another thought: some form of XOR encryption but include rot13 as well? Maybe to mislead? But maybe that's going too far: maybe rot13 is enough?

rot13 is fine :-)

Perhaps a fun comment (non-rot13-ed) could be put above the table go rot13 strings?

As for discussion do you have a preference of which list of things to include? The funny quotes, the C and Unix history/trivia or the IOCCC entries (which would eventually run out anyway I guess but the same could be said for some of the others in time)?

A mix might do.

In any case they're all quotes in a sense so what file should they be in? Do you have a name for the table (or is it more than one table?) and what about function names (should it be bogus names or should it be something more telling like show_quote()) ?

How about just compiled rot13 strings into a const array?

Thank you! Hope you're having a nice sleep! I'm thinking I might look at adding more test functions but I'm not sure how many or if I'll get to it today.

Best wishes .. time to focus on a presentation on volcanos that is being given today.

xexyl commented 2 years ago

Just to say: I am definitely open to discussing this. I am curious what you would prefer. You might also have some ideas of how to implement some of it.

Your idea 💡 about time based might be fun 🤩. Although rather than the thing changing each second, you might consider the think changing each day.

So each day has a new message? My only thought there: what happens when more days than the number of items has passed? Would you please elaborate?

How about a different message based on the week or month?

Hmm ... week of the year? In that case we would be limited in the number of messages. Month would be even a tighter limit.

But I guess you mean use time(3) and extract the week or month and use it as an index to the table? If that's not it what do you have in mind? And what about if we want to add more strings?

In the table from which time selects, it might be nice to mildly encode the string so that running strings(1) on the binary doesn't spoil it.

Funny you should mention that as I was thinking of using rot13 on it. Does that sound reasonable or do you suggest something else? Another thought: some form of XOR encryption but include rot13 as well? Maybe to mislead? But maybe that's going too far: maybe rot13 is enough?

rot13 is fine :-)

Perhaps a fun comment (non-rot13-ed) could be put above the table go rot13 strings?

In other words to mislead? I was thinking that as well - in fact that's what I was getting at. Ah, rereading it maybe not. What kind of comment? I mean there are many possibilities so I wonder what you're thinking of (even if only a general idea)?

As for discussion do you have a preference of which list of things to include? The funny quotes, the C and Unix history/trivia or the IOCCC entries (which would eventually run out anyway I guess but the same could be said for some of the others in time)?

A mix might do.

That sounds good to me.

In any case they're all quotes in a sense so what file should they be in? Do you have a name for the table (or is it more than one table?) and what about function names (should it be bogus names or should it be something more telling like show_quote()) ?

How about just compiled rot13 strings into a const array?

In other words take a file of all the strings and rot13 it and then put each string as its own element in the array? (So an array of char *s)? That's what I had in mind anyway.

Thank you! Hope you're having a nice sleep! I'm thinking I might look at adding more test functions but I'm not sure how many or if I'll get to it today.

Best wishes .. time to focus on a presentation on volcanos that is being given today.

Very nice! I LOVE volcanoes! Actually when I was a kid my mum used to read to my brother and me about volcanoes, earthquakes, space, dinosaurs and other things - as that's what we wanted to hear. I do think that even if it would be scary it would be quite an experience for a super volcano to have a super eruption .. or a mass extinction in some form. Hell, I've even had the thought that it'd be fun to have a dinosaur reincarnation! There was a time in my life where I wanted the Vogons to be real (now I do not though).

lcn2 commented 2 years ago

Hmm ... week of the year? In that case we would be limited in the number of messages. Month would be even a tighter limit.

But I guess you mean use time(3) and extract the week or month and use it as an index to the table? If that's not it what do you have in mind? And what about if we want to add more strings?

Perhaps given a table of M messages, compute year*52 + int(day of year / 7) and use that value, mod M, to index into the table?

lcn2 commented 2 years ago

OT

ery nice! I LOVE volcanoes! Actually when I was a kid my mum used to read to my brother and me about volcanoes, earthquakes, space, dinosaurs and other things - as that's what we wanted to hear. I do think that even if it would be scary it would be quite an experience for a super volcano to have a super eruption .. or a mass extinction in some form. Hell, I've even had the thought that it'd be fun to have a dinosaur reincarnation! There was a time in my life where I wanted the Vogons to be real (now I do not though).

See this show schedule for today at 13:00 Pacific.

xexyl commented 2 years ago

OT

ery nice! I LOVE volcanoes! Actually when I was a kid my mum used to read to my brother and me about volcanoes, earthquakes, space, dinosaurs and other things - as that's what we wanted to hear. I do think that even if it would be scary it would be quite an experience for a super volcano to have a super eruption .. or a mass extinction in some form. Hell, I've even had the thought that it'd be fun to have a dinosaur reincarnation! There was a time in my life where I wanted the Vogons to be real (now I do not though).

See this show schedule for today at 13:00 Pacific.

Cool! How long is it supposed to be? I might have to have it set to download as it happens and then scp it to the laptop tomorrow.

xexyl commented 2 years ago

Hmm ... week of the year? In that case we would be limited in the number of messages. Month would be even a tighter limit. But I guess you mean use time(3) and extract the week or month and use it as an index to the table? If that's not it what do you have in mind? And what about if we want to add more strings?

Perhaps given a table of M messages, compute year*52 + int(day of year / 7) and use that value, mod M, to index into the table?

Not thinking this out so maybe it should be obvious and it's not. Would this allow for any number of messages? If so that would work. At a very quick thought it would allow this. Is that right?

lcn2 commented 2 years ago

Hmm ... week of the year? In that case we would be limited in the number of messages. Month would be even a tighter limit.

But I guess you mean use time(3) and extract the week or month and use it as an index to the table? If that's not it what do you have in mind? And what about if we want to add more strings?

Perhaps given a table of M messages, compute year*52 + int(day of year / 7) and use that value, mod M, to index into the table?

Not thinking this out so maybe it should be obvious and it's not. Would this allow for any number of messages? If so that would work. At a very quick thought it would allow this. Is that right?

Yes, for M > 0.

xexyl commented 2 years ago

Hmm ... week of the year? In that case we would be limited in the number of messages. Month would be even a tighter limit.

But I guess you mean use time(3) and extract the week or month and use it as an index to the table? If that's not it what do you have in mind? And what about if we want to add more strings?

Perhaps given a table of M messages, compute year*52 + int(day of year / 7) and use that value, mod M, to index into the table?

Not thinking this out so maybe it should be obvious and it's not. Would this allow for any number of messages? If so that would work. At a very quick thought it would allow this. Is that right?

Yes, for M > 0.

Which of course it would be since we want messages .. plus dividing by 0 is undefined (which as far as I have experienced always was a segfault) and I believe mod on negative numbers is also undefined though maybe I'm wrong there.

I'm working on something now. I'm not sure I'll get it all in but I have some fun additions and I think the commit log will be fun too. I hope to manage this today but we shall see. It depends on how I feel. Either way I'm working on this. What's certain is the table of messages will have to be updated over time as I won't have them all at once.

I might inject some of my own programming quotes to help populate the table but we'll see about that.

Hope you had a good sleep and have a good weekend!

UPDATE:

Run into a problem I did not think of before... If the strings are encoded in rot13 what to do about newlines? It would change it to \a which of course is a beep. But if it's a static const char table we can't modify the table so we have to iterate over each char and print out the real value - which poses a problem for escape chars.

Any suggestions? Another possible way I suppose is to have them in a file and read into a dynamic array. Another thought is have .. Ah. Maybe this will work. Have a special char that means \n. I'll do that so when you see it you'll know what I mean. Let's say for now it'll be a '#' as I don't imagine many quotes will have that and it's not part of rot13. If necessary we can change it to something else that's not rot13 encoded and which isn't in any quotes.

UPDATE 1

Expect an email at some point from me about this. Before I commit I'm going to share it with you as given the nature of what's being done it is necessarily somewhat obfuscated. Right now it works but needs more testing and certainly more quotes (only two so far). I don't like the look of it and I think I need to take a break before I can sort that out but I hope to at least send you an idea later today. I'll of course tell Leo it's only for you - though maybe I'll throw in the Tolkien stuff I told you about as I believe you said he also read HoMe so he would probably like these things too.

UPDATE 2

You should have an email. If the format does not work please let me know (format of the email - comments interspersed in the diff). If you wish to discuss it here that's fine too. If for some reason the mail did not reach you please let me know and I'll try again (or do it here). I didn't want to give too much away here which is why I chose to do it over email.