ioccc-src / mkiocccentry

Form an IOCCC submission as a compressed tarball file
Other
28 stars 6 forks source link

Help wanted: compressed tarball filename sanity check tool: filenamechk #41

Closed lcn2 closed 2 years ago

lcn2 commented 2 years ago

Discussed in https://github.com/ioccc-src/mkiocccentry/discussions/39

Originally posted by **lcn2** January 29, 2022 # We need help with more tools in this repo You can help. See below. # We are working hard to be able to run a MOCK IOCCC in 2022 By MOCK IOCCC we mean a beta test run of an IOCCC where entries submitted would NOT be judged. Instead the IOCCC judges would go though the motions of running mock contest in order to beta test the process. Nobody would win the MOCK IOCCC, although we would certainly thank those who helped test the process with special thanks to those who found something in the MOCK process that we need to fix. > Any important lessons learned during the MOCK IOCCC would result in changes to the real IOCCC28. Our goal is to hold the MOCK IOCCC in 2022 and then start the reason IOCCC28 in 2022. There is A LOT of work that goes on behind the scenes by the IOCCC judges. You can help the IOCCC judges by helping us with the following tools related to mkiocccentry. See below. # JSON sanity check tools - Write a tool that attempts validate (performs sanity checks) on the .info.json produced by mkiocccentry. - Write a tool that attempts validate (performs sanity checks) on the .author.json produced by mkiocccentry Those two tools could be written as pull requests to add them as separate programs to this repo. Moreover mkiocccentry could be modified to perform a function call into that code (in the same way the code iocccsize function is executed directly from mkiocccentry while iocccsize is also compiled as a separate tool). Even though mkiocccenty would run the core of those tools as function calls, we still need the tools as standalone programs for use elsewhere. So both mkiocccentry callable functions AND standalone tools are needed. The standalone tools should take a file as an argument. If sane, these tools should exit 0, otherwise exit non-zero. # compressed tarball filename sanity check tool - Write a tool that validates the filename of a compressed tarball file. The mkiocccenty tool forms a compressed tarball file with a very specific form of a filename (for internal IOCCC judging process reasons). We need a tool that performs a sanity check on the filename of a compressed tarball file. That tool could be added to this repo as a pull request. Moreover mkiocccentry could be modified to perform a function call into that code (in the same way the code iocccsize function is executed directly from mkiocccentry while iocccsize is also compiled as a separate tool) as a sanity check of what mkiocccentry produced. Even though mkiocccenty would run the core of those tools as function calls, we still need the tool as standalone programs for use elsewhere. So both mkiocccentry callable functions AND standalone tool are needed. The standalone tool should take a file as an argument. If sane, these tool should exit 0, otherwise exit non-zero. # compressed tarball sanity check tool - Write a tool to validate a compressed tarball produced by mkiocccentry Such a compressed tarball tool would need to perform safety and sanity checks on the compressed tarball WITHOUT uncompressing it on disk. It would need to be paranoid, expecting the worst while attempting to validate for the best. For example, it could run "tar -t" and look at the output to verify that the files un-tar into a sub-directory (i.e., not wander off with ../../../foo path, not attempting to write something under /foo). It would also call the code of the "validate the filename of the compressed tarball file" (in the same way the code iocccsize function is executed directly from mkiocccentry while iocccsize is also compiled as a separate tool). This tool would need to verify that the sub-directory that the compressed tarball would produce is related to the previously validated compressed tarball filename. This compressed tarball tool would need to perform the two checks on the tarball size both checking the size of the compressed tarball and tar reports as the overall size of the elements of the tarball (see limit_ioccc.h). This sanity check tool should verify that the compressed tarball has the required files (prog.c, Makefile, remarks.md, .info.json, .author.json). The sanity check tool should be sure that no other files beginning with "." would be created. It should check that no sub-sub-directories would be created, just the entry directory from which the tarball was created in the first place. It should check that all files created are UNDER the entry directory only. That tool could be added to this repo as a pull request. Moreover mkiocccentry could be modified to perform a function call into that code (in the same way the code iocccsize function is executed directly from mkiocccentry while iocccsize is also compiled as a separate tool) as a sanity check of what mkiocccentry produced. Even though mkiocccenty would run the core of those tools as function calls, we still need the tool as standalone programs for use elsewhere. So both mkiocccentry callable functions AND standalone tool are needed. The standalone compress tarball sanity check tool should take a file as an argument. If sane, these tool should exit 0, otherwise exit non-zero. # In general The reason why mkiocccentry should be able to execute the sanity check code of the above mention tools is that we (the IOCCC judges) plan to use those standalone tools in our judging procedures. Nevertheless mkiocccentry should perform the same sanity checks to be help be sure what is generates will later will be checked by the IOCCC judges via the standalone tools. We don't want a mismatch of what mkiocccentry produces to conflict with the result of the sanity check standalone tools. We don't want someone's IOCCC entry they worked hard on, invalidated because mkiocccenty produced something that the standalone tool later rejected. That is why mkiocccentry should call on the sanity check code to validate what it is doing / creating. Instead of loading up mkiocccentry with fork/exec of calling standalone tools and parsing the results, also allow function calls from mkiocccentry to run the same sanity checking code. These sanity check tools need to be written in a clean, highly readable, trivial to understand style. Ironic with such a request given that this is in support of Obfuscated C, yes, but IOCCC is about irony in a way. We want it to be easy for us and others to inspect what this code is doing. We want it easy for this code to be modified and improved on in the future. We want this code to stand as a good example of how to write well written, easy to understand, easy to improve C code. Use of the dbg facility, and the -v level debugging style used by mkiocccentry will be important. And if a similar coding style with ALL libc function calls checked for errors is also important. And as a C contest, we would highly prefer these tools be written in portable C and compiled with gcc or clang without warnings. # In summary Your help wanted AND is appreciated!
lcn2 commented 2 years ago

Please also see these issues where help is wanted: See also:

Thanks for your consideration and help!

lcn2 commented 2 years ago

The filename of the compressed tarball is of the form:

"entry." ("test" or a valid UUID) "." entry_number "." timestamp ".txz"

Where entry_number is an integer between 0 and MAX_ENTRY_NUM (see limit_ioccc.h).

Where timestamp is an integer number of seconds since the epoch. Assume 64-bit integers.

For sure there needs to be a bounds check other than to verify that it is only an integer > 0.

A case can be made for an integer > 1643466665 (it is now "Sat Jan 29 06:31:05 PST 2022" so the time of this posting 👍 ). The "> 1643466665" may be good enough. We don't want to have to edit the tool for every contest to update that value. Nevertheless the timestamp cannot be before "Sat Jan 29 06:31:05 PST 2022" seems good enough for the future.

ilyakurdyukov commented 2 years ago

Why is file time is so important, what if someone is using a local system where the wrong time is set?

You want to discard the entries just for this?

lcn2 commented 2 years ago

Fair point, @ilyakurdyukov.

Maybe just a timestamp is an 64-bit integer > 0 is only what we need.

ilyakurdyukov commented 2 years ago

You already agreed, but for example: I have an aarch64 dev board with Linux that doesn't have its own battery to power the clock, so if I don't use it for a month, the clock on it is one month behind. I don't often connect it to the internet where it can connect to a time server.

xexyl commented 2 years ago

On Jan 29, 2022, at 06:32, Landon Curt Noll @.***> wrote:

 The filename of the compressed tarball is of the form:

"entry." ("test" or a valid UUID) "." entry_number "." timestamp ".txz"

This is how it is currently but you want to change this, right ?

Where timestamp is an integer number of seconds since the epoch. We are not sure there needs to be a bounds check other than to verify that it is only an integer > 0 and perhaps even > 1643466665 (it is now "Sat Jan 29 06:31:05 PST 2022" so the time of this posting 👍 ).

I am not sure if I follow this. All I can get is you want something done or checked wrt the timestamp.

The "> 1643466665" is optional. We don't want to have to edit the tool for every contest. So perhaps a simple (it serially cannot be before "Sat Jan 29 06:31:05 PST 2022" is good enough?

I am not sure if this sentence was finished? Are you saying that you don’t want the timestamp in the file name? If not can you give an example new file name maybe for the next contest and the one after (since it seems to have to do with not having to update the tool each contest)?

The part where you say you don’t want to have to edit the tool each contest is what made me think that maybe you want to remove the timestamp but then above it looks like you still want it there (at least in my current state of alertness).

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

lcn2 commented 2 years ago

On Jan 29, 2022, at 06:32, Landon Curt Noll @.***> wrote:  The filename of the compressed tarball is of the form: "entry." ("test" or a valid UUID) "." entry_number "." timestamp ".txz" This is how it is currently but you want to change this, right ?

This is how it is currently. Just documenting it;

Where timestamp is an integer number of seconds since the epoch. We are not sure there needs to be a bounds check other than to verify that it is only an integer > 0 and perhaps even > 1643466665 (it is now "Sat Jan 29 06:31:05 PST 2022" so the time of this posting 👍 ). I am not sure if I follow this. All I can get is you want something done or checked wrt the timestamp. The "> 1643466665" is optional. We don't want to have to edit the tool for every contest. So perhaps a simple (it serially cannot be before "Sat Jan 29 06:31:05 PST 2022" is good enough? I am not sure if this sentence was finished? Are you saying that you don’t want the timestamp in the file name? If not can you give an example new file name maybe for the next contest and the one after (since it seems to have to do with not having to update the tool each contest)? The part where you say you don’t want to have to edit the tool each contest is what made me think that maybe you want to remove the timestamp but then above it looks like you still want it there (at least in my current state of alertness).

As @ilyakurdyukov pointed out, the only timestamp related check needed is to verify that the timestamp > 0.

xexyl commented 2 years ago

On Jan 29, 2022, at 06:49, Landon Curt Noll @.***> wrote:

 On Jan 29, 2022, at 06:32, Landon Curt Noll @.***> wrote:  The filename of the compressed tarball is of the form: "entry." ("test" or a valid UUID) "." entry_number "." timestamp ".txz" This is how it is currently but you want to change this, right ?

This is how it is currently. Just documenting it;

Right. Just making sure.

Where timestamp is an integer number of seconds since the epoch. We are not sure there needs to be a bounds check other than to verify that it is only an integer > 0 and perhaps even > 1643466665 (it is now "Sat Jan 29 06:31:05 PST 2022" so the time of this posting 👍 ). I am not sure if I follow this. All I can get is you want something done or checked wrt the timestamp. The "> 1643466665" is optional. We don't want to have to edit the tool for every contest. So perhaps a simple (it serially cannot be before "Sat Jan 29 06:31:05 PST 2022" is good enough? I am not sure if this sentence was finished? Are you saying that you don’t want the timestamp in the file name? If not can you give an example new file name maybe for the next contest and the one after (since it seems to have to do with not having to update the tool each contest)? The part where you say you don’t want to have to edit the tool each contest is what made me think that maybe you want to remove the timestamp but then above it looks like you still want it there (at least in my current state of alertness).

As @ilyakurdyukov pointed out, the only check we really need is to verify that the timestamp > 0.

Let’s say that for some reason it doesn’t report that. What should the outcome be?

And the reverse: do you still want it in the file name?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

xexyl commented 2 years ago

That’s really interesting! I have never seen a system not have serious problems with time set incorrectly!

I know that’s OT but still very interesting.

On Jan 29, 2022, at 06:45, Ilya Kurdyukov @.***> wrote:

 You already agreed, but for example: I have an aarch64 dev board with Linux that doesn't have its own battery to power the clock, so if I don't use it for a month, the clock on it is one month behind. I don't often connect it to the internet where it can connect to a time server.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

lcn2 commented 2 years ago

BTW: The compressed tarball filename sanity check tool in standalone form will also be used to help validate files uploaded to the submit server, BTW. But that is a much later detail this well beyond the scope of this issue.

All this tool needs to do is the run some basic sanity checks on the filename of the compressed tarball file.

The filename must:

That should be good enough for what we need.

lcn2 commented 2 years ago

That’s really interesting! I have never seen a system not have serious problems with time set incorrectly! I know that’s OT but still very interesting.

OT reply from Landon (one of the IOCCC judges):

By profession, I am an astronomer. I work a lot with clocks. For example, there is one orbiting close to the Sun helping test general relativity. I have run a atomic clock at my lab that helps track continental drift and helps measure the distance to the moon down to the mm level.

I am a primary author the POSIX standard as it relates to computers and clocks. I agree with you remark about the serious problems with computer clocks.

But all that is OT for another day. :-)

xexyl commented 2 years ago

So remove the timestamp is what I'm seeing. An example file name would be:

entry.test.0.2022.txz

I chose the 2022 as the year but it could be any number non-zero 64-bit integer. 0 would be the entry number, entry is the prefix and test is the id. Does the second number have to be different per entry? If not why not just make it the year as that can always be easily obtained (whereas you'd have to modify the code for the contest number for example).

If that's correct that should be easy enough to do but since you've asked for help something tells me there's more to it :)

xexyl commented 2 years ago

Thank you for this commentary! I've saved a screenshot of it and put it in my photos.

OT reply from Landon (one of the IOCCC judges):

By profession, I am an astronomer. I work a lot with clocks. For example, there is one orbiting close to the Sun helping test general relativity. I have run a atomic clock at my lab that helps track continental drift and helps measure the distance to the moon down to the mm level.

I knew you were an astronomer and I think that's really cool. I did not know these facts though! Thank you for sharing!

I am a primary author the POSIX standard as it relates to computers and clocks. I agree with you remark about the serious problems with computer clocks.

This does not surprise me somehow. I can't remember if it's you or the other but I know more than 0 judges love primes too. Well they are magical and the guidelines refer to 'Curio!' (wonderful resource that is).

But all that is OT for another day. :-)

I suppose so. You should tell the tale though eventually!

..and as for time being wrong I remember clock skew problems and compiling the linux kernel. Easy fix was to touch the files but I know there have been system crashes over time changes (like at least one of the two leap seconds in 2012).

ilyakurdyukov commented 2 years ago

I'm assuming this number just needs to be increased so the server knows you're uploading a newer tar archive?

lcn2 commented 2 years ago

I'm assuming this number just needs to be increased so the server knows you're uploading a newer tar archive?

The entry number needs to be consistent when one is submitting multiple programs.

If someone has two entries to the IOCCC, say entry 0 and entry 1, then they need to form two compressed tarballs and upload them to the submit server in two different slots. And If someone later re-uploads entry X, then we will assume that they wish to replace the earlier tarball of X with the new one.

For the iOCCC judges, the format of the filenames will help us line up which submission is which and for which IOCCC contestant.

There are other aspects of the IOCCC submission server that are beyond the scope of this request that relate to compressed tarball filenames. But that would be a too much of a distraction from this request.

BTW: It this was not clear from what mkiocccentry printed, we might need to improve what it prints today.

We hope that clarifies.

lcn2 commented 2 years ago

So remove the timestamp is what I'm seeing. An example file name would be:

entry.test.0.2022.txz

I chose the 2022 as the year but it could be any number non-zero 64-bit integer. 0 would be the entry number, entry is the prefix and test is the id. Does the second number have to be different per entry? If not why not just make it the year as that can always be easily obtained (whereas you'd have to modify the code for the contest number for example).

If that's correct that should be easy enough to do but since you've asked for help something tells me there's more to it :)

We will rely on the hope that if someone re-submits a compressed tarball for the same entry number, that the timestamp of the later replacement tarball will be > than the previous upload that they are replacing.

In deed, if someone where to re-upload to the submit server before we, the IOCCC judges, have a chance to fetch the compressed tarball file, we would never see the file they replaced. We know that is information about the submit server that beyond the scope of this request (it has to do with requirements imposed on us by those proving the server). We provide it as context only.

If we, the IOCCC judges later fetch a 2nd compressed tarball for the same entry number, we will check that the timestamp is > than the file we are replacing.

That is part of the reasons why the compressed tarball filename has a timestamp inside it.

ilyakurdyukov commented 2 years ago

I'm assuming this number just needs to be increased so the server knows you're uploading a newer tar archive?

The entry number needs to be consistent when one is submitting multiple programs.

Not the entry number, I meant the timestamp (because we started by discussing the timestamp).

lcn2 commented 2 years ago

I'm assuming this number just needs to be increased so the server knows you're uploading a newer tar archive?

The entry number needs to be consistent when one is submitting multiple programs.

Not the entry number, I meant the timestamp (because we started by discussing the timestamp).

OK. You are mostly correct.

The timestamp needs to be different so the server knows you're uploading a different tar archive. We (the IOCCC judges) need to see the timestamp increase so we you're uploading a newer tar archive.

Does that help?

xexyl commented 2 years ago

For the iOCCC judges, the format of the filenames will help us line up which submission is which and for which IOCCC contestant.

So do you in the future want to know when it's the same author as another?

BTW: It this was not clear from what mkiocccentry printed, we might need to improve what it prints today.

Maybe you should do that.

xexyl commented 2 years ago

So remove the timestamp is what I'm seeing. An example file name would be:

entry.test.0.2022.txz

I chose the 2022 as the year but it could be any number non-zero 64-bit integer. 0 would be the entry number, entry is the prefix and test is the id. Does the second number have to be different per entry? If not why not just make it the year as that can always be easily obtained (whereas you'd have to modify the code for the contest number for example). If that's correct that should be easy enough to do but since you've asked for help something tells me there's more to it :)

We will rely on the hope that if someone re-submits a compressed tarball for the same entry number, that the timestamp of the later replacement tarball will be > than the previous upload that they are replacing.

In deed, if someone where to re-upload to the submit server before we, the IOCCC judges, have a chance to fetch the compressed tarball file, we would never see the file they replaced. We know that is information about the submit server that beyond the scope of this request (it has to do with requirements imposed on us by those proving the server). We provide it as context only.

If we, the IOCCC judges later fetch a 2nd compressed tarball for the same entry number, we will check that the timestamp is > than the file we are replacing.

That is part of the reasons why the compressed tarball filename has a timestamp inside it.

Hmm... If updating the same entry number is to replace the entry why not just replace the tarball? I mean you won't be judging the same entry in more than one form, right?

Well as I said in the other thread I've been at this too long so I'll try coming back to this tomorrow. It might be helpful if you gave two example entries that each have the original tarball name and the updated tarball so that we can see exactly what you're getting at (since as you point out some of the information is due to information that we don't have).

Have a good day!

lcn2 commented 2 years ago

For the iOCCC judges, the format of the filenames will help us line up which submission is which and for which IOCCC contestant.

So do you in the future want to know when it's the same author as another?

BTW: It this was not clear from what mkiocccentry printed, we might need to improve what it prints today.

Maybe you should do that.

We have plans to update the author questions to ask authors who are previous winners to enter something called an "author handle". However this is a later change. We need to generate JSON files for all of the IOCCC winners, and upload those files to the web site, before we can add this feature.

The winner JSON files will be part of a tool set to rebuild the www.ioccc.org web site. That is another whole project that is underway.

So yes, we are planning that, however not right now as the web site re-design is still in the early stages. We do plan to add something for this before the MOCK IOCCC is started.

xexyl commented 2 years ago

For the iOCCC judges, the format of the filenames will help us line up which submission is which and for which IOCCC contestant.

So do you in the future want to know when it's the same author as another?

BTW: It this was not clear from what mkiocccentry printed, we might need to improve what it prints today.

Maybe you should do that.

We have plans to update the author questions to ask authors who are previous winners to enter something called an "author handle". However this is a later change. We need to generate JSON files for all of the IOCCC winners, and upload those files to the web site, before we can add this feature.

That sounds very interesting!

The winner JSON files will be part of a tool set to rebuild the www.ioccc.org web site. That is another whole project that is underway.

So yes, we are planning that, however not right now as the web site re-design is still in the early stages. We do plan to add something for this before the MOCK IOCCC is started.

Sounds exciting! I'm still not sure if I know exactly what you're after here with the file name but I'll have a reread another day (just replying to your message and haven't read the details again).

As you'll see (if you haven't already) I fixed the high bit warning in mkiocccentry.c and made a PR for it. That as well as some other changes I'm waiting on another PR to be accepted first (since it's not mine) is about all I'll do today with the code I think.

lcn2 commented 2 years ago

See https://github.com/ioccc-src/mkiocccentry/issues/42#issuecomment-1026444612 for how txzchk (the tool for #42) and filenamechk (the tool for this issue #41) and mkiocccentry interact.

lcn2 commented 2 years ago

Initial alpha test code added via 7e8d3aef5e36345fd1cebe7a46bbd34498ab8365

xexyl commented 2 years ago

Just pulled it. This might be a good motivation for me to work on it! I was having a difficult time getting myself started on it and it took some work to get myself to put the utility functions in separate files (that was apparently not necessary as you were working on it too).

Besides implementing the way the tools should interact what else with this tool has to be done? I'll take a look later on today or else tomorrow.