Enhancement: create the jinfochk tool - check on the contents of an .info.json file

lcn2 commented 2 years ago

We need to create the jinfochk tool in order to help verify that contents of an .info.json file found within an entry directory.

This tool will primarily be used by other tools (not humans). As such it should behave like fnamchk in that if all is well, it should not print anything and simply exit 0. If there are problems found with the .info.json file, then warning messages should be printed to stderr AND the jinfochk tool should exit with a non-zero status. The use of a -v level may be use to assist in debugging.

The jinfochk tool is primarily a stand alone tool. As a sanity check, the mkiocccentry program should execute the jinfochk code AFTER .info.json file has been created and before the compressed tarball is formed. If mkiocccentry program sees a 0 exit status, then all is well. For a non-zero exit code, the tool probably should abort because any problems detected by jinfochk based on what mkiocccentry wrote into .info.json indicates there is a serious mismatch between what mkiocccentry is doing and what jinfochk expects.

The following might be how mkiocccentry output is changed with the use of this tool (and the other tool):

Is the above list a correct list of files in your entry? [yn]: y

Checking the format of .info.json ...
... all appears well with the .info.json file.

Checking the format of .author.json ...
... all appears well with the .author.json file.

About to run the tar command to form the compressed tarball ...

As a stand alone tool, the jinfochk tool will be invoked by other tools as part of the IOCCC submission process. That process is beyond the scope of this repo. Suffice it to sat the the IOCCC judges will use this tool is part of their submission workflow.

Here is a possible command line usage message:

jinfochk [-h] [-v level] [-V] file

    -h          print help message and exit 0
    -v level        set verbosity level: (def level: 0)
    -V          print version string and exit

        file                    path to a .info.json file

exit codes:

        0                       no errors or warnings detected
        >0                      some error(s) and/or warning(s) were detected

NOTE: We mention file above even though the canonical filename will be .info.json. The tool should NOT check, nor object to using a different filename.

The mkiocccentry tool will need to invoke this tool. As such a similar method used to find and specify the location of txzchk should be used. As this tool is one of 2 tools being considered, we recommend the following of added to the mkiocccentry command line:

    -j /path/to/jinfochk    path to jinfochk executable used by txzchk (def: ./jinfochk)
    -J /path/to/jauthchk    path to jauthchk executable used by txzchk (def: ./jauthchk)

IMPORTANT: While it might be tempting to consider depending on some general JSON checker, we do NOT need nor want that. It is important that the mkiocccentry GitHub repo remain stand alone. I.e., all the code needed by someone wishing to enter the IOCCC (beside a C compiler, make, tar, cp, ls) should found in this GitHub repo alone. As there is NO standard JSON tool in widespread distribution the all of the code for this tool needs to reside in this repo only.

IMPORTANT: We do not need a general JSON format checker. We only need to verify that the file contains the JSON needed and only the JSON needed for the judges to process IOCCC entries.

While is it NOT recommended, if someone wishes to edit their .info.json and re-create the compressed tarball we cannot stop them. As such mkiocccentry should be STRICT on what is writes into .info.json AND jinfochk should be permissive (but not to a fault) in what is considers as OK.

This tool should neither generate an error, nor warn if someone were to reformat the JSON. And as JSON is not order dependent, of someone wishes to reorder the JSON elements, that is fine. As long as all the requirement JSON elements are present, and no new JSON elements are found, and the version string matches, all is OK.

Should something go wrong and a change to the JSON is required during an open IOCCC, the judges will preserve the older JSON check tools and use those against older JSON formats. This there is no need for a >= version check: a version string match seems good enough.

See the a followup comment for details on the checks needed against an .info.json file.

lcn2 commented 2 years ago

The following is a guide as to what needs to be checked in an .info.json file. The values found below are simply examples.

To recap an import point made above:

IMPORTANT: We do not need a general JSON format checker. We only need to verify that the file contains the JSON needed and only the JSON needed for the judges to process IOCCC entries.

While is it NOT recommended, if someone wishes to edit their .info.json and re-create the compressed tarball we cannot stop them. As such mkiocccentry should be STRICT on what is writes into .info.json AND jinfochk should be permissive (but not to a fault) in what is considers as OK.

This tool should neither generate an error, nor warn if someone were to reformat the JSON. And as JSON is not order dependent, of someone wishes to reorder the JSON elements, that is fine. As long as all the requirement JSON elements are present, and no new JSON elements are found, and the version string matches, all is OK.

See www.json.org for details of the JSON format.

All files must start with a {.

    "IOCCC_info_version" : "1.7 2022-02-04",

The IOCCC_info_version value MUST match INFO_VERSION from limit_ioccc.h.

    "ioccc_contest" : "IOCCC28",

The ioccc_contest value MUST match IOCCC_CONTEST from limit_ioccc.h.

    "ioccc_year" : 2022,

The ioccc_year value MUST match IOCCC_YEAR from limit_ioccc.h.

    "mkiocccentry_version" : "0.35 2022-02-07",

The mkiocccentry_version value MUST match MKIOCCCENTRY_VERSION from limit_ioccc.h.

    "iocccsize_version" : "28.7 2022-02-01",

The iocccsize_version value MUST match IOCCCSIZE_VERSION from limit_ioccc.h.

    "IOCCC_contest_id" : "test",

The IOCCC_contest_id value MUST be either test or a valid Contest ID (a UUID with version UUID_VERSION and variant UUID_VARIANT as defined in limit_ioccc.h`).

    "entry_num" : 0,

The entry_num value must be >= 0 and <= MAX_ENTRY_NUM as defined in limit_ioccc.h.

    "title" : "title",

The title value must be match the rules enforced by the get_title() function from mkiocccentry.

    "abstract" : "abstract",

The abstract value must be match the rules enforced by the get_abstract() function from mkiocccentry.

    "tarball" : "entry.test-0.1644618833.txz",

The tarball value be match the rules enforced by the fnamchk tool.

    "rule_2a_size" : 2345,

The rule_2a_size value must be an integer. The jinfochk tool should NOT enforce a Rule 2a check.

    "rule_2b_size" : 1234,

The rule_2b_sizevalue must be an integer. The jinfochk tool should NOT enforce a Rule 2b check.

    "empty_override" : false,
    "rule_2a_override" : false,
    "rule_2a_mismatch" : false,
    "rule_2b_override" : false,
    "highbit_warning" : false,
    "nul_warning" : false,
    "trigraph_warning" : false,
    "wordbuf_warning" : false,
    "ungetc_warning" : false,
    "Makefile_override" : false,
    "first_rule_is_all" : true,
    "found_all_rule" : true,
    "found_clean_rule" : true,
    "found_clobber_rule" : true,
    "found_try_rule" : true,
    "test_mode" : true,

The these values must be either be true or false.

    "manifest" : [
        {"info_JSON" : ".info.json"},
        {"author_JSON" : ".author.json"},
        {"c_src" : "prog.c"},
        {"Makefile" : "Makefile"},
        {"remarks" : "remarks.md"},
        {"extra_file" : "extra1"},
        {"extra_file" : "extra2"}
    ],

The manifest must be a JSON array. In that array there MUST be exactly one of:

info_JSON
author_JSON
c_src
Makefile
remarks

There may be zero of more extra_file elements in the manifest array.

The values must contain ONLY POSIX Fully portable characters (ASCII A-Z a-z 0-9 . _ - and the + character).

Except for info_JSON and author_JSON the values CANNOT start with a "." (ASCII period), nor "-" (ASCII dash), nor "+" (ASCII plus).

The info_JSON value MUST be ".info.json".

The author_JSON value MUST be ".author.json".

NOTE: Due to a design flaw of the JSON spec, the last value of the manifest array cannot be followed by a "," (ASCII comma). All value except the last value of the manifest array must be followed by a "," (ASCII comma).

    "formed_timestamp" : 1644618833,

The formed_timestamp value must be an integer >= MIN_TIMESTAMP as defined in limit_ioccc.h.

    "formed_timestamp_usec" : 631668,

The formed_timestamp_usec value must be an integer >= 0 and <= 999999.

    "timestamp_epoch" : "Thr Jan  1 00:00:00 1970 UTC",

The timestamp_epoch value must match TIMESTAMP_EPOCH as defined in limit_ioccc.h.

    "min_timestamp" : 1643987926

The min_timestamp value must match MIN_TIMESTAMP as defined in limit_ioccc.h.

    "formed_UTC" : "Fri Feb 11 23:09:42 2022 UTC"

The formed_UTCvalue must be a date string in the same format as the output of the following date command:

date "+%a %b %d %H:%M:%S %Y UTC"

All files must end with a }.

To be valid, the .info.json file must have exactly one of each of the above mentioned JSON "string: : "value" (or JSON array in the case of manifest). No other JSON elements are allowed.

lcn2 commented 2 years ago

Comments, suggestions, corrections and clarifications for the above long comment are welcome.

We recommend that you copy only the relevant parts of the long comment when you do. :-)

If/where needed, we will attempt to modify the long comments in place above, where and when possible.

xexyl commented 2 years ago

Comments, suggestions, corrections and clarifications for the above long comment are welcome.

We recommend that you copy only the relevant parts of the long comment when you do. :-)

If/where needed, we will attempt to modify the long comments in place above, where and when possible.

This looks really good! I will start working on it (both) tomorrow (I will relook at this page and the one for the other tool before doing so).

I can modify what I did today in one file so that the following can be done immediately:

Parse options.
Verify the file exists and is a regular file and is readable.

The structs already exist so that’s easy to take care of too. I will update the Makefile as well. I will put both in a single pull request.

Oh and as for not forcing the filename: the way I had it is that you could use a different basename but in that case you had to specify the format to use. Since it’s to be two tools the options won’t be necessary.

I can do this relatively quickly but I think it’s wise and better that I react to your other comments in the other thread and then get some sleep as I have been waking up so early.

But unless something goes wrong expect a pull request with both tools started when you wake up tomorrow! Things are set up so all that has to be done is the actual parsing of the files.

I really enjoyed working on txzchk and I imagine that I will really enjoy working on these too. I will ask questions when I start working on it but like I said the only thing that has to be done now is parsing (and of course writing warnings and errors).

xexyl commented 2 years ago

  "IOCCC_info_version" : "1.7 2022-02-04",
The IOCCC_info_version value MUST match INFO_VERSION from limit_ioccc.h.
  "ioccc_contest" : "IOCCC28",
The ioccc_contest value MUST match IOCCC_CONTEST from limit_ioccc.h.
  "ioccc_year" : 2022,
The ioccc_year value MUST match IOCCC_YEAR from limit_ioccc.h.

Since some of these checks are the same they could go in a new file json.c. In my pull request I have a new json.h file which has the structs related to the json files and so it seems fitting that json.c should have general routines for the json checkers.

Does that sound like a good idea to you?

  "mkiocccentry_version" : "0.35 2022-02-07",
The mkiocccentry_version value MUST match MKIOCCCENTRY_VERSION from limit_ioccc.h.
  "iocccsize_version" : "28.7 2022-02-01",
The iocccsize_version value MUST match IOCCCSIZE_VERSION from limit_ioccc.h.
  "IOCCC_contest_id" : "test",
The IOCCC_contest_id value MUST be either test or a valid Contest ID (a UUID with version UUID_VERSION and variant UUID_VARIANT as defined in limit_ioccc.h`).

It seems like these would also be good to be in the same file as for the other tool since (unless I'm reading it wrong) these are the same checks. Perhaps additionally the parsing of UUID could be moved into util.c (via taking what's in fnamchk.c wrt the UUID and moving it to util.c). That would simplify the checking I would think.

  "entry_num" : 0,
The entry_num value must be >= 0 and <= MAX_ENTRY_NUM as defined in limit_ioccc.h.

Is there a reason that the author json file also has this as well? I don't mind but it in a way seems convenient since it could be a single check for both utilities.

  "title" : "title",
The title value must be match the rules enforced by the get_title() function from mkiocccentry.
  "abstract" : "abstract",
The abstract value must be match the rules enforced by the get_abstract() function from mkiocccentry.

So perhaps a separate function for each (that those other functions can call) which validate the strings. I might start out doing that. Would it be better to have these in util.h/util.c or some other file?

  "tarball" : "entry.test-0.1644618833.txz",
The tarball value be match the rules enforced by the fnamchk tool.

I regret adding this to the .info.json file (as now I have to do more work)! :) But perhaps a minor addition to the fnamchk could help here: if an option is specified it would print out the actual file name in full IF it passes: so that this tool could also call fnamchk with this option and simply do a strcmp() on the two.

Alternatively it could be that the code is moved into another function (like above) so that this is also modular. Where do you think it should go? Or should it just be another option to fnamchk?

  "rule_2a_size" : 2345,
The rule_2a_size value must be an integer. The jinfochk tool should NOT enforce a Rule 2a check.

If it's not supposed to run iocccsize (and it makes sense not to) how do you propose checking out of range values? What should be done for < 0? I presume an error should be reported if it's e.g. < 0 || >= INT_MAX but is this correct?

  "rule_2b_size" : 1234,
The rule_2b_sizevalue must be an integer. The jinfochk tool should NOT enforce a Rule 2b check.

My same thoughts and question above applies to here: except that we should use strtoul() cast to an unsigned int. Or another possibility: use strspn() on it with the string: "0123456789". This would also detect negative numbers since the "-" wouldn't be there (this applies to the above too). Any preference here?

Except for info_JSON and author_JSON the values CANNOT start with a "." (ASCII period), nor "-" (ASCII dash), nor "+" (ASCII plus).

In this case I think what I did in txzchk cannot be moved over (since the function that check is in is more complicated) but since this is a simple check it doesn't matter. On the other hand should I add any checks like this in txzchk: make sure that the file names only have these characters (I already check for dot files so this wouldn't be necessary)? If that was done it could be used in both. I'd be happy to make add this code if you think that's a good idea.

As I said in the other thread I probably am done for the day - but some of the above might be a really good start: some of the code can be adapted and if you think it's a good idea to check file names in the tarball (it sounds like a good idea to me) then I'll be happy to add it. Perhaps at the same time (depending on where it is in txzchk) I could make it a separate function that I could put in util.c/util.h. Just let me know.

lcn2 commented 2 years ago

Since some of these checks are the same they could go in a new file json.c. In my pull request I have a new json.h file which has the structs related to the json files and so it seems fitting that json.c should have general routines for the json checkers.

Does that sound like a good idea to you?

Yes, there will be common functions that could be useful to more than one tool.

lcn2 commented 2 years ago

"entry_num" : 0, The entry_num value must be >= 0 and <= MAX_ENTRY_NUM as defined in limit_ioccc.h.

Is there a reason that the author json file also has this as well? I don't mind but it in a way seems convenient since it could be a single check for both utilities.

Yes there is a reason why are duplicates between the two files.

lcn2 commented 2 years ago

So perhaps a separate function for each (that those other functions can call) which validate the strings. I might start out doing that. Would it be better to have these in util.h/util.c or some other file?

Perhaps util.h/util.c.

lcn2 commented 2 years ago

If it's not supposed to run iocccsize (and it makes sense not to) how do you propose checking out of range values? What should be done for < 0? I presume an error should be reported if it's e.g. < 0 || >= INT_MAX but is this correct?

The JSON file can be valid even if the Rule 2b size is not valid.

lcn2 commented 2 years ago

My same thoughts and question above applies to here: except that we should use strtoul() cast to an unsigned int. Or another possibility: use strspn() on it with the string: "0123456789". This would also detect negative numbers since the "-" wouldn't be there (this applies to the above too). Any preference here?

It would be better to parse the input like mkiocccentry does: using the trick of that get_entry_num() does with sscanf() in mkiocccentry.

xexyl commented 2 years ago

Since some of these checks are the same they could go in a new file json.c. In my pull request I have a new json.h file which has the structs related to the json files and so it seems fitting that json.c should have general routines for the json checkers. Does that sound like a good idea to you?

Yes, there will be common functions that could be useful to more than one tool.

Right. Well tomorrow before my cousin gets here I can work a bit on it though I might not work on it as much. We'll see. I will hopefully be able to get things set up so that it's easy to just focus on parsing only.

xexyl commented 2 years ago

"entry_num" : 0, The entry_num value must be >= 0 and <= MAX_ENTRY_NUM as defined in limit_ioccc.h.

Is there a reason that the author json file also has this as well? I don't mind but it in a way seems convenient since it could be a single check for both utilities.

Yes there is a reason why are duplicates between the two files.

I can now think of some possible reasons even. But these same variables can be parsed in the same way so those can go into json.c. That seems good to me.

xexyl commented 2 years ago

So perhaps a separate function for each (that those other functions can call) which validate the strings. I might start out doing that. Would it be better to have these in util.h/util.c or some other file?

Perhaps util.h/util.c.

Okay then tomorrow that's one thing I can probably do - get the parsing of the UUID string in fnamchk to be in util.h/util.c. Would you mind providing an example UUID (each variant allowed) that can be used to test the code? That would be very helpful.

xexyl commented 2 years ago

If it's not supposed to run iocccsize (and it makes sense not to) how do you propose checking out of range values? What should be done for < 0? I presume an error should be reported if it's e.g. < 0 || >= INT_MAX but is this correct?

The JSON file can be valid even if the Rule 2b size is not valid.

I wasn't thinking so much as verifying that the size fits in the rule; rather I was thinking if the number is out of range for the integer type. If I understand you right though what you're saying here is that as long as it's an integer it's okay?

xexyl commented 2 years ago

My same thoughts and question above applies to here: except that we should use strtoul() cast to an unsigned int. Or another possibility: use strspn() on it with the string: "0123456789". This would also detect negative numbers since the "-" wouldn't be there (this applies to the above too). Any preference here?

It would be better to parse the input like mkiocccentry does: using the trick of that get_entry_num() does with sscanf() in mkiocccentry.

That could work too I suppose - though I thought sscanf() can lead to overflow. I'll take a look at it though. Of course if " is invalid (and should be reported) it would be easy to just use strtol() family since that family will stop at the first invalid character. But then there are other possible problems with those functions as we discussed in the txzchk page.

Anyway I'll use sscanf() then.

xexyl commented 2 years ago

Since the files have to be validly named: do you want txzchk to also validate this? If I were to add it it could very possibly be adapted for this tool too.

Going for now. I might be able to write more later today but I'm not certain of it. I'll read any replies when I get back. If you can provide valid UUIDs (and maybe invalid UUIDs) that would be very helpful to test the tool.

lcn2 commented 2 years ago

in meeting for about 2 hours ..

xexyl commented 2 years ago

No worries.

On Feb 12, 2022, at 11:57, Landon Curt Noll @.***> wrote:

in meeting for about 2 hours ..

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

lcn2 commented 2 years ago

If you can provide valid UUIDs (and maybe invalid UUIDs) that would be very helpful to test the tool.

Here is an example value UUID for an IOCCC contestant ID:

12345678-1234-4321-abcd-1234567890ab

See also how fnamchk now correctly parses the UUID-entry number:

        len = strlen(uuid);
...
        if (len != UUID_LEN+1+MAX_ENTRY_CHARS) {
            err(9, __func__, "2nd '-' separated token length: %lu != %lu: %s",
                             (unsigned long)len, (unsigned long)(UUID_LEN+1+MAX_ENTRY_CHARS), filepath);
            not_reached();
        }
        ret = sscanf(uuid, "%8x-%4x-%1x%3x-%1x%3x-%8x%4x-%d%c", &a, &b, &version, &c, &variant, &d, &e, &f, &entry_num, &guard);
        if (ret != 9) {
            err(10, __func__, "2nd '.' separated non-test not in UUID-entry_number format: %s", filepath);
            not_reached();
        }
        if (version != UUID_VERSION) {
            err(11, __func__, "2nd '.' separated UUID token version %x != %x: %s", version, UUID_VERSION, filepath);
            not_reached();
        }
        if (variant != UUID_VARIANT) {
            err(12, __func__, "2nd '.' separated UUID token variant %x != %x: %s", variant, UUID_VARIANT, filepath);
            not_reached();
        }
        dbg(DBG_LOW, "entry ID is a valid UUID: %s", uuid);
        if (entry_num < 0) {
            err(13, __func__, "3rd '.' separated entry number: %d is < 0: %s", entry_num, filepath);
            not_reached();
        }
        if (entry_num > MAX_ENTRY_NUM) {
            err(14, __func__, "2nd '-' separated entry number: %d is > %d: %s", entry_num, MAX_ENTRY_NUM, filepath);
            not_reached();
        }
        dbg(DBG_LOW, "entry number is valid: %d", entry_num);

NOTE: In the above code uuid now refers to a string that contains both the UUID and the "-" and the entry number.

lcn2 commented 2 years ago

It is hard for us to determine what parts of a long reply need to be responded to and what are just comments. It is easy for us to miss something within such a long reply, sorry.

Perhaps single issue messages might make that easier? Anyway if we missed a question, please ask it again, perhaps as 1 question (or 1 question set) per post?

xexyl commented 2 years ago

If you can provide valid UUIDs (and maybe invalid UUIDs) that would be very helpful to test the tool.

Here is an example value UUID for an IOCCC contestant ID:
12345678-1234-4321-abcd-1234567890ab

Thanks. As I said in the other thread I wanted to be sure it's exactly as it might be in the contest.

See also how fnamchk now correctly parses the UUID-entry number:

        len = strlen(uuid);
...
        if (len != UUID_LEN+1+MAX_ENTRY_CHARS) {
            err(9, __func__, "2nd '-' separated token length: %lu != %lu: %s",
                             (unsigned long)len, (unsigned long)(UUID_LEN+1+MAX_ENTRY_CHARS), filepath);
            not_reached();
        }
        ret = sscanf(uuid, "%8x-%4x-%1x%3x-%1x%3x-%8x%4x-%d%c", &a, &b, &version, &c, &variant, &d, &e, &f, &entry_num, &guard);
        if (ret != 9) {
            err(10, __func__, "2nd '.' separated non-test not in UUID-entry_number format: %s", filepath);
            not_reached();
        }
        if (version != UUID_VERSION) {
            err(11, __func__, "2nd '.' separated UUID token version %x != %x: %s", version, UUID_VERSION, filepath);
            not_reached();
        }
        if (variant != UUID_VARIANT) {
            err(12, __func__, "2nd '.' separated UUID token variant %x != %x: %s", variant, UUID_VARIANT, filepath);
            not_reached();
        }
        dbg(DBG_LOW, "entry ID is a valid UUID: %s", uuid);
        if (entry_num < 0) {
            err(13, __func__, "3rd '.' separated entry number: %d is < 0: %s", entry_num, filepath);
            not_reached();
        }
        if (entry_num > MAX_ENTRY_NUM) {
            err(14, __func__, "2nd '-' separated entry number: %d is > %d: %s", entry_num, MAX_ENTRY_NUM, filepath);
            not_reached();
        }
        dbg(DBG_LOW, "entry number is valid: %d", entry_num);

Yep. I think that this parsing can be moved to another file (I guess util.c/util.h) so that the same validation can be used in fnamchk as well as the json checkers.

NOTE: In the above code uuid now refers to a string that contains both the UUID and the "-" and the entry number.

Thanks for that note.

xexyl commented 2 years ago

It is hard for us to determine what parts of a long reply need to be responded to and what are just comments. It is easy for us to miss something within such a long reply, sorry.

Perhaps single issue messages might make that easier? Anyway if we missed a question, please ask it again, perhaps as 1 question (or 1 question set) per post?

As I wrote in the other issue - no worries. I'm sorry actually. I thought the way I did it would be sufficient. My apologies there. I'll be sure to bring up any issues separately from now on.

Going to sleep now. Have a great night!

xexyl commented 2 years ago

Oh quickly before I go. I just did a git pull and I saw this in the changelog:

    Compressed tarball filenames were changed so that a -,
    not a . separates test or the UUID from the entry number.
    Fixed how fnamchk worked with the UUID-entry_number case.

Good catch - and thanks for correcting this. I thought I had got it all.

And with that good night! Most likely I'll have a pull request in the morning.

lcn2 commented 2 years ago

See this important comment about JSON encoding and improvements to how mkiocccsize encodes JSON strings.

lcn2 commented 2 years ago

We fixed a JSON formatting bug in .info.json with commit 5cdaf5c8d20a1cca44c8b14e28d968038722a0e8.

lcn2 commented 2 years ago

The malloc_json_decode() and malloc_json_decode_str() functions complete and tested with commit 2e2228b5addcd4e3971fa25b364b2f4eaa8e6d79.

Renamed malloc_json_str() to malloc_json_encode_str().

The JSON string encode/decode functions are now code complete and pass the jencchk() tests.

lcn2 commented 2 years ago

We will write the jstrencode and jstrdecode functions tomorrow and create a jstr-test.sh test to add to make test.

As these are independent of your jauthchk and jinfochk tools, the above mentioned work should not impact your work. Sorry for the length of time the JSON string encode/decode took. It was 1535 lines of rather hairy code :-).

lcn2 commented 2 years ago

The read_all() is code complete. We are writing the jstrencode and jsredecode tools in order to test this function.

xexyl commented 2 years ago

I will have a look at it tomorrow!

I am going to sleep now. Just replied to the error codes message.

More tomorrow I hope! Good night. I hope you can get more sleep tonight - it seems like you didn’t get much last night.

Anyway I am going for the night. Sleep well when you do!

On Feb 14, 2022, at 18:50, Landon Curt Noll @.***> wrote:

The read_all() is code complete. We are writing the jstrencode and jsredecode tools in order to test this function.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

lcn2 commented 2 years ago

Added jstrencode and jstrdecode in order to test the new read_all() function as well as to further test the JSON string encoding and decoding process.

The make test now runs j-test.sh to perform some tests on JSON encoding and decoding.

xexyl commented 2 years ago

Added jstrencode and jstrdecode in order to test the new read_all() function as well as to further test the JSON string encoding and decoding process.

The make test now runs j-test.sh to perform some tests on JSON encoding and decoding.

I took a brief look at the read_all() function yesterday and then gave it a bit of thought (later on) how you meant with using strtok_r() and I think that due to what you said in the other issue that this is the obvious way forward (which is what I was thinking in the first place).

I hope I can work more on it in a bit. I got a lot of sleep last night so I'm hoping that I'll have more energy later in the day.

Thanks for adding the function. I believe I remember why these tools were made but I'll have a look back later on before I work on the jinfochk and jauthchk (which for some of the fields I can use common functions which will be nice). The obvious first step is to change it so it doesn't read the file line by line as that loop is no longer necessary.

I haven't looked at the read_all() thoroughly but I do recall seeing in the comments you referred to NUL bytes. I'll add the check though like I did in txzchk (and you did in mkiocccentry - I think that's where you put it).

xexyl commented 2 years ago

Totally OT:

I guess because this repo is more important atm that you've not had a chance to look at my updates on the winner repo and that's why the pull request was not merged yet. Is this correct?

Either way I'll say the important cake recipe (same for icing) modification I made: you have to USE pure vanilla extract: do NOT use imitation vanilla. I'm saying that here in case you didn't see since I know you plan on making the cake sometime this year.

xexyl commented 2 years ago

For both jinfochk and jauthchk I've changed it so it reads in the full file and verifies that the first character is { and the last character is }. However this failed the test because a newline is written after the closing }.

For now I removed the \n but I'm unclear on the requirements: you say the last character has to be } but should trailing whitespace be removed i.e. is }\n acceptable? I guess it is but I want to be absolutely sure. For now I updated the mkiocccentry functions to not write the trailing newline but if a trailing newline or spaces is/are acceptable I'll write the newline again and skip trailing spaces.

Let me know please. Once that's done these two checks can be put into a single function. I could do that now of course but I'm taking a break so will get back to it later.

lcn2 commented 2 years ago

Added jstrencode and jstrdecode in order to test the new read_all() function as well as to further test the JSON string encoding and decoding process. The make test now runs j-test.sh to perform some tests on JSON encoding and decoding.

I took a brief look at the read_all() function yesterday and then gave it a bit of thought (later on) how you meant with using strtok_r() and I think that due to what you said in the other issue that this is the obvious way forward (which is what I was thinking in the first place).

I hope I can work more on it in a bit. I got a lot of sleep last night so I'm hoping that I'll have more energy later in the day.

Thanks for adding the function. I believe I remember why these tools were made but I'll have a look back later on before I work on the jinfochk and jauthchk (which for some of the fields I can use common functions which will be nice). The obvious first step is to change it so it doesn't read the file line by line as that loop is no longer necessary.

I haven't looked at the read_all() thoroughly but I do recall seeing in the comments you referred to NUL bytes. I'll add the check though like I did in txzchk (and you did in mkiocccentry - I think that's where you put it).

We recommend that when you use read_all() to bring the entire JSON file into memory, that you perform a scan for NUL bytes (and of course declare the file invalid if you find any). BTW: This can happen even if someone isn't trying to pull a fast one: a bad block on a disk (from the submitters side), for example can turn into a block of NUL bytes (on the IOCCC submission side).

lcn2 commented 2 years ago

Totally OT:

I guess because this repo is more important atm that you've not had a chance to look at my updates on the winner repo and that's why the pull request was not merged yet. Is this correct?

Either way I'll say the important cake recipe (same for icing) modification I made: you have to USE pure vanilla extract: do NOT use imitation vanilla. I'm saying that here in case you didn't see since I know you plan on making the cake sometime this year.

Yes, we are in the middle of a few thousand edits: your Pull request will not be forgotten.

lcn2 commented 2 years ago

For both jinfochk and jauthchk I've changed it so it reads in the full file and verifies that the first character is { and the last character is }. However this failed the test because a newline is written after the closing }.

For now I removed the \n but I'm unclear on the requirements: you say the last character has to be } but should trailing whitespace be removed i.e. is }\n acceptable? I guess it is but I want to be absolutely sure. For now I updated the mkiocccentry functions to not write the trailing newline but if a trailing newline or spaces is/are acceptable I'll write the newline again and skip trailing spaces.

Let me know please. Once that's done these two checks can be put into a single function. I could do that now of course but I'm taking a break so will get back to it later.

An interpretation could go as follows:

A top level; JSON object resides within a file called a JSON file.  A JSON file consists of a JSON object, followed by more whitespace characters.

This is probably another flaw in the JSON spec.

lcn2 commented 2 years ago

For now I removed the \n but I'm unclear on the requirements: you say the last character has to be } but should trailing whitespace be removed i.e. is }\n acceptable? I guess it is but I want to be absolutely sure. For now I updated the mkiocccentry functions to not write the trailing newline but if a trailing newline or spaces is/are acceptable I'll write the newline again and skip trailing spaces.

How about this as a requirement:

A JSON file must stat with a (top level) JSON object and be followed by a newline.

lcn2 commented 2 years ago

The link to the seqcexit repo now works. For some reason the repo was previously marked as private.

xexyl commented 2 years ago

Added jstrencode and jstrdecode in order to test the new read_all() function as well as to further test the JSON string encoding and decoding process. The make test now runs j-test.sh to perform some tests on JSON encoding and decoding.

I took a brief look at the read_all() function yesterday and then gave it a bit of thought (later on) how you meant with using strtok_r() and I think that due to what you said in the other issue that this is the obvious way forward (which is what I was thinking in the first place). I hope I can work more on it in a bit. I got a lot of sleep last night so I'm hoping that I'll have more energy later in the day. Thanks for adding the function. I believe I remember why these tools were made but I'll have a look back later on before I work on the jinfochk and jauthchk (which for some of the fields I can use common functions which will be nice). The obvious first step is to change it so it doesn't read the file line by line as that loop is no longer necessary. I haven't looked at the read_all() thoroughly but I do recall seeing in the comments you referred to NUL bytes. I'll add the check though like I did in txzchk (and you did in mkiocccentry - I think that's where you put it).

We recommend that when you use read_all() to bring the entire JSON file into memory, that you perform a scan for NUL bytes (and of course declare the file invalid if you find any). BTW: This can happen even if someone isn't trying to pull a fast one: a bad block on a disk (from the submitters side), for example can turn into a block of NUL bytes (on the IOCCC submission side).

I did indeed add that check. And that's a valid point. Years ago I had a bug written to a CD or DVD (not sure which - linux iso image anyway) and I went to report it. Turned out that I had faulty RAM and that was the problem.

That would be unfortunate though to have the entry rejected in that case but I guess there's nothing that can be done here?

xexyl commented 2 years ago

Totally OT: I guess because this repo is more important atm that you've not had a chance to look at my updates on the winner repo and that's why the pull request was not merged yet. Is this correct? Either way I'll say the important cake recipe (same for icing) modification I made: you have to USE pure vanilla extract: do NOT use imitation vanilla. I'm saying that here in case you didn't see since I know you plan on making the cake sometime this year.

Yes, we are in the middle of a few thousand edits: your Pull request will not be forgotten.

Now that you mention that I remember you saying that. Thanks for confirming. No worries on my behalf.

xexyl commented 2 years ago

For both jinfochk and jauthchk I've changed it so it reads in the full file and verifies that the first character is { and the last character is }. However this failed the test because a newline is written after the closing }. For now I removed the \n but I'm unclear on the requirements: you say the last character has to be } but should trailing whitespace be removed i.e. is }\n acceptable? I guess it is but I want to be absolutely sure. For now I updated the mkiocccentry functions to not write the trailing newline but if a trailing newline or spaces is/are acceptable I'll write the newline again and skip trailing spaces. Let me know please. Once that's done these two checks can be put into a single function. I could do that now of course but I'm taking a break so will get back to it later.

An interpretation could go as follows:
A top level; JSON object resides within a file called a JSON file.  A JSON file consists of a JSON object, followed by more whitespace characters.
This is probably another flaw in the JSON spec.

It seems there are a lot of those.

xexyl commented 2 years ago

For now I removed the \n but I'm unclear on the requirements: you say the last character has to be } but should trailing whitespace be removed i.e. is }\n acceptable? I guess it is but I want to be absolutely sure. For now I updated the mkiocccentry functions to not write the trailing newline but if a trailing newline or spaces is/are acceptable I'll write the newline again and skip trailing spaces.

How about this as a requirement:
A JSON file must stat with a (top level) JSON object and be followed by a newline.

Maybe it's clear to you but it's still not to me: is it oaky if one of the files has white space prior to the opening { (and the same with white space after the closing })? Given that a newline is written after the closing } I presume that it's okay but I want to be absolutely certain of it.

Thanks.

xexyl commented 2 years ago

seqcexit repo

Thanks. Would it be of help if I work it out and make use of it? Or will you still do that? Anyway like I said I'll continue with more tomorrow. Sorry for the delay!

xexyl commented 2 years ago

Thank you. Please disregard my other question. I didn’t see this.

On Feb 16, 2022, at 09:43, Landon Curt Noll @.***> wrote:

For now I removed the \n but I'm unclear on the requirements: you say the last character has to be } but should trailing whitespace be removed i.e. is }\n acceptable? I guess it is but I want to be absolutely sure. For now I updated the mkiocccentry functions to not write the trailing newline but if a trailing newline or spaces is/are acceptable I'll write the newline again and skip trailing spaces.

How about this as a requirement:

A JSON file must stat with a (top level) JSON object and be followed by a newline. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

lcn2 commented 2 years ago

For now I removed the \n but I'm unclear on the requirements: you say the last character has to be } but should trailing whitespace be removed i.e. is }\n acceptable? I guess it is but I want to be absolutely sure. For now I updated the mkiocccentry functions to not write the trailing newline but if a trailing newline or spaces is/are acceptable I'll write the newline again and skip trailing spaces.

How about this as a requirement:
A JSON file must stat with a (top level) JSON object and be followed by a newline.
Maybe it's clear to you but it's still not to me: is it oaky if one of the files has white space prior to the opening { (and the same with white space after the closing })? Given that a newline is written after the closing } I presume that it's okay but I want to be absolutely certain of it.

Thanks.

If one were being strict, then there would be no whitespace before the first {. If one was being permissive, then whitespace before the first { would NOT be disallowed (i.e. do not generate an error , nor a warning if leading whitespace as detected).

Perhaps a similar approach should be taken that was used with jstrdecode:

usage: ./jstrdecode [-h] [-v level] [-V] [-t] [-n] [-s] [string ...]
...
    -s      decode using strict mode (def: not strict)

as well as in json.c use of the strict boolean:

/*
 * malloc_json_decode - return a decoding of a block of JSON encoded memory
 *
 * given:
 *      ptr     start of memory block to decode
 *      len     length of block to decode in bytes
 *      retlen  address of where to store malloced length, if retlen != NULL
 *      strict  true ==> strict decoding based on how malloc_json_encode() encodes
 *              false ==> permit a wider use of \-escaping and un-encoded char
 *
 *                  NOTE: struct == false implies a strict reading of the JSON spec
 *
 * returns:
 *      malloced JSON decoding of a block, or NULL ==> error
 *      NOTE: retlen, if non-NULL, is set to 0 on error
 */

Good general protocol practice says that the default should be not strict. In particular:

Be strict on what we generate.  Be permissive on what we accept.

BTW: The above motto is from the IETF and is used in RFCs for various internet based protocols. Ob course, one should not be dogmatic about what is strict and what is permissive. :-)

So maybe jinfochk and jauthchk should have a -s flag, with the default (i.e., without -s) being not struct. So that:

struct == true:

    Disallow anything (including whitespace) before the first "{".
    Permit only a single trailing newline ("\n") after the last "}".

struct == false:

     Allow zero or more leading whitespace before the leading "{".
     Allow zero or more trailing whitespace after the last "}".

When mkiocccentry calls jinfochk and jauthchk to inspect the .info.json and .author.json files it just created it should invoke jinfochk and jauthchk with the -s (struct mode).

When the IOCCC judges, use jinfochk and jauthchk to look at an entry, we will NOT use -s (i.e., not struct mode).

BTW: Like jstrdecode, when jinfochk and jauthchk call malloc_json_decode_str(..., strict_mode) and/or malloc_json_decode(..., strict_mode),. then should pass the strict_mode boolean accordingly.

Does that seem reasonable to you, @xexyl?

xexyl commented 2 years ago

For now I removed the \n but I'm unclear on the requirements: you say the last character has to be } but should trailing whitespace be removed i.e. is }\n acceptable? I guess it is but I want to be absolutely sure. For now I updated the mkiocccentry functions to not write the trailing newline but if a trailing newline or spaces is/are acceptable I'll write the newline again and skip trailing spaces.

How about this as a requirement:
A JSON file must stat with a (top level) JSON object and be followed by a newline.
Maybe it's clear to you but it's still not to me: is it oaky if one of the files has white space prior to the opening { (and the same with white space after the closing })? Given that a newline is written after the closing } I presume that it's okay but I want to be absolutely certain of it. Thanks.
If one were being strict, then there would be no whitespace before the first {. If one was being permissive, then whitespace before the first { would NOT be disallowed (i.e. do not generate an error , nor a warning if leading whitespace as detected).

Actually I had the thought of adding a boolean to the function that tests for the { (and the one that tests for the }) so that if it's true it would skip spaces; otherwise it would not. There could even be an option in the tool that specifies this? That way if you wanted to you could change the call in mkiocccentry to be strict (perhaps -s?) and then you'd have in the guidelines this detail. Then when calling the tools directly you can also decide. In the end the valid submissions should be the ones that use the correct version of the tool.

And on that note maybe there could be a system for when to update the version so that mismatches do not happen? I guess that would happen when the contest opens: there would be no change in version then (a version change right before it opens).

Does a strict option sound like a good idea?

Perhaps a similar approach should be taken that was used with jstrdecode:

usage: ./jstrdecode [-h] [-v level] [-V] [-t] [-n] [-s] [string ...]
...
  -s      decode using strict mode (def: not strict)

as well as in json.c use of the strict boolean:

/*
 * malloc_json_decode - return a decoding of a block of JSON encoded memory
 *
 * given:
 *      ptr     start of memory block to decode
 *      len     length of block to decode in bytes
 *      retlen  address of where to store malloced length, if retlen != NULL
 *      strict  true ==> strict decoding based on how malloc_json_encode() encodes
 *              false ==> permit a wider use of \-escaping and un-encoded char
 *
 *                  NOTE: struct == false implies a strict reading of the JSON spec
 *
 * returns:
 *      malloced JSON decoding of a block, or NULL ==> error
 *      NOTE: retlen, if non-NULL, is set to 0 on error
 */

Well two minds think alike they say! :) I'll add a strict option then.

Good general protocol practice says that the default should be not strict. In particular:
Be strict on what we generate.  Be permissive on what we accept.

Sounds good.

BTW: The above motto is from the IETF and is used in RFCs for various internet based protocols. Ob course, one should not be dogmatic about what is strict and what is permissive. :-)

Do you have an example RFC that this is in?

So maybe jinfochk and jauthchk should have a -s flag, with the default (i.e., without -s) being not struct. So that:

I really did not read these ideas before I suggested a -s option! I'll add this.


struct == true:

    Disallow anything (including whitespace) before the first "{".
    Permit only a single trailing newline ("\n") after the last "}".

That's helpful, thanks.

struct == false:

 Allow zero or more leading whitespace before the leading "{".
 Allow zero or more trailing whitespace after the last "}".

Thanks.

When mkiocccentry calls jinfochk and jauthchk to inspect the .info.json and .author.json files it just created it should invoke jinfochk and jauthchk with the -s (struct mode).

Okay will do that.

When the IOCCC judges, use jinfochk and jauthchk to look at an entry, we will NOT use -s (i.e., not struct mode).

Sounds good.

BTW: Like jstrdecode, when jinfochk and jauthchk call malloc_json_decode_str(..., strict_mode) and/or malloc_json_decode(..., strict_mode),. then should pass the strict_mode boolean accordingly.

I'll do that: when I get to that point. I still have to work out what that function does (I have not looked at it yet other than a very brief look at the comments).

Does that seem reasonable to you, @xexyl?

Sounds good to me and as I said I actually thought of the same idea! :)

I'm going to merge the changes to my fork and then pull and then go from there. I expect to make changes today but it might be a bit of time before I do; I'm not sure.

xexyl commented 2 years ago

I think it would be of help if you added the strict boolean to the malloc_json_decode() function as it's a complicated function (at least I think - I haven't really looked at it) and you wrote it so you know it well.

However PLEASE don't do it just yet! I have some changes in the works that also have some changes in json.c.

When I make a pull request all will be good on that part. As it is I'm thinking I might remove that part from json.c so I can get the other commits in. I might do that in which case it might be good. I'll let you know later today - probably in the morning.

xexyl commented 2 years ago

Okay I've got that code working after a short break. I'm going to do some commits and then a pull request. It's several commits though so might take a little bit.

After that if you can add the strict boolean to the function you referred to that'd be helpful. I might do a bit more today but I'm not sure yet. If not there will definitely be more tomorrow. Either way I'll be probably taking a break after this.

xexyl commented 2 years ago

Update: Now that the first and last characters of the files are dealt with okay the next step would be to extract the fields and parse the values. I have an idea how to do this (the identification part I mean - the extraction is a given) but I'm going to start on it tomorrow.

I feel better today but I want to take it easy one more day. In the meantime if you could add the strict boolean to the decoding functions that would be great. I think tomorrow I can make good progress on it. There still are some things that I'm unsure of but I'll work those out as I'm going for it.

It's possible I do a bit more today but I'm not sure of that one way or the other.

lcn2 commented 2 years ago

BTW: The above motto is from the IETF and is used in RFCs for various internet based protocols. Ob course, one should not be dogmatic about what is strict and what is permissive. :-)

Do you have an example RFC that this is in?

See Robustness principle and one of the early statements in RFC 1122.

lcn2 commented 2 years ago

I think it would be of help if you added the strict boolean to the malloc_json_decode() function as it's a complicated function (at least I think - I haven't really looked at it) and you wrote it so you know it well.

The malloc_json_decode() function was written, originally, with a strict boolean argument:

char *
malloc_json_decode(char const *ptr, size_t len, size_t *retlen, bool strict)

After that if you can add the strict boolean to the function you referred to that'd be helpful. I might do a bit more today but I'm not sure yet. If not there will definitely be more tomorrow. Either way I'll be probably taking a break after this.

That function, along with:

char *
malloc_json_decode_str(char const *str, size_t *retlen, bool strict)

already have this mode. That is how jstrdecode uses the -s flag.

ioccc-src / mkiocccentry

Enhancement: create the jinfochk tool - check on the contents of an .info.json file #68