Submission 1 (Research/Regular)

opencert / workshop-2019

Working repository of the "9th International Workshop on Open Community approaches to Education, Research and Technology" *** towards "Open community approaches" CERTification processes *** Porto, Portugal, 8 October 2019, Co-located with FM 2019

0 stars 1 forks source link

Submission 1 (Research/Regular) #1

Open opencert opened 5 years ago

opencert commented 5 years ago

Title: A Calculus of Chaos For Stochastic Program Compilation

Abstract: An unexpected result from an open project to develop a `chaotic' compiler for ANSI C is described here: an information entropy calculus for stochastically compiled programs. A stochastic compiler produces randomly different object codes every time it is applied to the same source code. The calculus quantifies the entropy introduced into the run-time program traces by the compiler, allowing the strategy for maximum entropy to be characterised and that the compiler implements it to be verified. The analysis aims to support the argument that, on a suitable run-time platform and via any polynomial time method, the programmer's intention is unreadable from the trace, as word length n tends to infinity. OpenCERT_2019_paper_1.pdf

nevrenato commented 5 years ago

I would like to review this submission. Renato Neves

paddykrishnan commented 5 years ago

I can review this paper.

opencert commented 5 years ago

Assigned PC members:

Renato Neves
Padmanabhan Krishnan
Alexander K. Petrenko
Markus Roggenbach
Sulayman Sowe

pbreuer commented 5 years ago

I have other deadlines 28 July and 1 Aug but I am momentarily free so I can resolve immediate doubts anyone is having like "what is this really all about" and such, to save you from having to reverse engineer from facts to motives, or just decode the logic. Conversation of any kind should make things more familiar.

paddykrishnan commented 5 years ago

I have only skim read the paper. While section 2 is relevant to the open source community, the rest of the paper does not make any direct connection to open source development/certification. Any pointers to where in the paper this is made would be useful when I re-read it more carefully.

pbreuer commented 5 years ago

Generic observations about open source are not formal methods, so would not quite fit in the forum and that means I do not quite understand the basis of your comment. I think my difficulty is that I do not understand the words "a direct connection to open source development/certification"? What the paper describes IS a thing that is an open source development so what more direct a thing than that can there be? As to certification ...

The aim of formal methods is to certify something is true, with the force of logic. A paper that invents a high level specification for a compiler to satisfy (be chaotic) and shows that it if it follows a construction rule (make every instruction that writes introduce maximum entropy) satisfies it (is chaotic) is providing a method of certifying any particular compiler as being satisfactorily chaotic. It is telling those who come after that if you follow this precept in every little detail of making your compiler do its work, then you will get what you want. You need to certify that each compiler construction satisfies the condition, and think carefully about subtle ways in which it may not be so.

Shall I give examples of the subtle ways in which one can fail to do that?

I think the abstract says best what you may be worried about but I actually do not get what you may be worried about, so if you could elaborate your thoughts I would be grateful:

The calculus quantifies the entropy introduced into the run-time program traces by the compiler, allowing the strategy for maximum entropy to be characterised and that the compiler implements it to be verified.

My guess, from the fact that I appear not to understand your words, is that you are expressing an `I don't know what this is all about and please give me a link to something I do know about and have experienced so I can orient myself'? I think you are looking for a pigeonhole to drop this in. If you find a pigeonhole by all means let me know and I will be glad to claim it.

Perhaps you could ask in what way an open source project X that produces a method of certifying Y of Z (the exemplar z for which is a component of the open source project's goal), does not have a "direct connection to open source development/certification". Put like that I still don't get it.

It may also be that you are looking for a fact about compilers, that to translate a source code construct foo[bar], they first translate bar, laying down code, then lay down some more that references labels in it. That is, trans(foo[bar]) = trans(bar)\x trans(foo,x) (in a kind of synthetic grammar kind of spec). So one can rely on the entropy in trans(bar) staying put and all one has to worry about is that the extra bit of code trans(foo,x) also plays its part in adding as much entropy as possible. So ... look after the detail of each construction and one will get what one wants in the end.

But the paper's contribution is in opening up this sphere of intellectual actvity (how to measure chaos) by naming it as a concept and showing how to think about it, as WELL AS providing an actual methodology for going about proving one has got what one wants of this new thing. I happily will take insights that express that better.

How would you solve the problem of specifying something like chaos and then proving it is produced? The paper quantifies the notion and provides the apparatus for a proof. It doesn't do the proof for the particular instance that the apparatus (a calculus) was abstracted from the proof for .. because that would be about something else. Here you have the torture apparatus, not the torturee.

nevrenato commented 5 years ago

Dear Peter,

thanks for your reply.

I read your paper and, like Padmanabhan, and I'm not sure whether it fits in OpenCert's scope. I read what you wrote above but this didn't help me much.

In my view (and I ask the chairs to correct me on this) OpenCert calls for contributions towards *) new or better "Open Community approaches to Education, Research, Technology and Formal Methods", the Open Community part being the central theme here. The paper seems more about "Secure Computing" and, as Padmanabhan said, the theme of "Open Community" is only touched in Section 2.

I understand that the paper is written in the context of an "open project": i.e. a project conducted in "public view" (with open development and weblogs). But I think that this alone doesn't suffice. Aside from being written in the context of an open project, why is the paper relevant to the "Open Community" ?

Cheers

paddykrishnan commented 5 years ago

On re-reading the paper and the CFP, I think the authors need to explicitly state how their paper fits in one of the following categories as indicated in the CFP.

Research, by aiming at

establishing open research projects as “cauldrons” of open data, open knowledge and collaborative development, as well as devising metodologies and tools for the management and assessment of such projects;
defining open peer-review methodologies for the assessment of research outputs and appropriate bibliometrics based on the community open feedback rather than on a questionable analysis of citations;
defining, more specifically, quality metrics and a formal process to certify open source software (OSS) products and the outcomes of other peer-production efforts.

pbreuer commented 5 years ago

Folks - I do not understand your concerns. The blurb for OpenCert says

In the context of FM, OpenCERT 2019 will propose thematic aspects related to the area of Formal Methods ... 3. Technology: integrating Formal Methods technologies and tools within OSS projects as a means for Formal Methods acceptance and diffusion; but still accepting submissions outside these aspects.

In what way is this not exactly integrating a formal method technology and tools (indeed, inventing one, the trace calculus as given) within an OSS project (the platform and compiler toolchain for encrypted computing)? Since invention, that calculus has become the driver for the compiler design. It was invented because nobody knew which way to go or why. It allowed the project to move forward, and told it which way to go, and what the goal should be, and the goal it pointed out and the way to it is/are completely remarkable, achieving fundamental breakthroughs.

Can you explain the logic behind your feelings please? I am interested in your self-analysis there. My feeling is that behind the words is really meant "err, this is all new and different stuff and we don't recognize it". YES. That is what I hope research is.

With respect to the bit of blurb I have quoted, one would need to say in what way a trace calculus for entropy is not a new idea, in what way it is not a formal method (looks formal to me .. formal methods means reasoning about programs as mathematical objects) and in what way it is not integrated within an OSS project.

That would be supporting logic. Absent that, I cannot comment.

If you think the conference message should be "how do we mature existing FM technologies and get them involved in existing big OSS projects" I disagree profoundly. That is reading what is not there, both in words and in spirit. What I think the conference is about is promoting open attitudes in FM particularly, and this shows that inventive research in FM can spring from from boring old OSS projects about C compilers.

But I am more concerned as a general question about why you should not be immediately eager to embrace a new idea full stop. Never mind one developed within an open source compiler project. It should be nice feature for the conference. Why not wildy enthusuastic? New stuff! Formal methods stuff! From an OSS project! For an OSS project! Making new compilation techniques POSSIBLE! Telling us that the technique is right! Inventing new concepts! Etc. If I ever get to review anything new anywhere*, I would give it five stars and tell the conference to accept it, no matter what it is about or what the conference is about!

New is really rare.

[A point of interest ... Did I include the proof that the compiled code is secure against polynomial time methods used by an attacker to try to figure out what the data in a trace is really intended to mean? I forget if I did or not. That is formal 'obfuscation' - whether it is practical obfuscation or not is another question. But it is always a crowd puller. I gave a talk on that at PriSC at POPL in january, showing this compiler, and the place was packed to the rafters for the talk, with lively questions throughout and they loved the real time show and sparks from the compiler and runtime traces]

[On anither point of interest ... those who have never done it might not know that simply writing a compiler is tough! I have done it at least three times in my life now, and that is three times too many. Writing one that does a new thing is tougher still. Writing one that does something that was not known to be possible is harder still. Writing one that does something that has not yet been formally expressed is even harder, and now it is expressed, it will be much easier for people who come after. On how hard, I estimated the other day that between november and february last, I put in 75 standard SE years of effort into writing >1M lines of working code in Haskell (counting finished lines only). Even stating what it is that the compiler is supposed to do in a formal way that can be understood is an achievement. Then managing to find a way to achieve that via low level compiler details - again in a way that can be understood and checked - is another. Then there is the actual work of doing it, correctly! And that discovery of the calculus itself overthrew the existing design of two years provenance by proving it wrong ...]

[A thought ... what might be of interest to you is the story of what compiler design this calculus proved wrong, and why. And that it showed what should be right, but we had no way of achieving it, and I did not think it was even possible to achieve and I had no clue about even a direction to go in to achieve it ... until it turned out that the calculus is able to evaluate the correctness of every single compiler construct and point to which one is righter than another. But I am usually not interested in writing stories, but in presenting research. That it comes from an open source project and is a new formal method should make it grade-A material so I am really intrested in WHY there should even be the glimmer of a fight about it. If you want more of the backstory and less of the actual research, perhaps you should say so ... but I know from experience that it is not worth making research seem anodyne just to get it published and I would be most interested in your analysis of the roots of the disquiet! ]

[I notice many of you are passing by my blog entries. You may have missed this note: Armstrong (1982; Barriers to scientific contributions: the author's formula.. Behavioral and Brain Sciences, 5, 197-199) formulated what he called "the author's formula", a set of rules that authors should use to increase the likelihood and speed of acceptance of their manuscripts. "Authors should (1) not pick an important problem, (2) not challenge existing beliefs, (3) not obtain surprising results, (4) not use simple methods, (5) not provide full disclosure, and (6) not write clearly." Exactly. If the problem you perceive is really one of more hof those, please reconsider.]

[* I forgot to give the reviewing story I meant to. Many years ago when I was an engineering postdoc at cambridge, I got tons of crank papers, new arithmetic, philosophy of faults, all that kind of stuff. One day I got a paper to review that said it was integrating functional programming and object oriented programming. I could not understand a word of it. It was something about f(x) being like x.f . I thought it was probably another crank paper, but I could not convince myself because I could not understand it. OK, one can't understand crank papers either, but this time I did not know if I did not understand it and it was right and I was stupid, or I did not understand it because I was right and it was stupid. I could not decide whether it was over or under my head. Finally I decided to contact the author ... we had some interesting email interchanges, and I decided that likely the problem was me: I was too stupid to understand the paper. I Okayed the paper, and forgot about it. I got some more incomprehensible papers sent me by the author and that was that. A couple of years later I was invited to interview for a post at Oxford. I went along, and at the head of the table was the newly installed professor of computer science at Oxford, my erstwhile correspondent, Joseph Goguen. I got the job. I don't remember a lot about the interview, but I do remember at one point either then or later Joseph Goguen explaining to me "you know when you don't know, which is a very rare quality". I didn't really understand him, because of course I know when I don't know! How could I not! I know very few things, and those gained after much painful thought, and even then easily overturned by a new and overlooked line of reasoning. But I have since learned that a awful lot of reviewers, particularly engineers, especially particularly hardware engineers, super especially so security engineers, do not know that they do not know and carry on speaking/writing rubbish in an authoratative, sneering tone when what they really mean is "I know nothing about this, so I will make up reasons why it should have nothing to do with me or a conference I am at and that will be the end of that". So I am very insistent that opinions should be backed by falsifiable logic, and not just be pyschological reactions presented as fact. (JG also asked me in the interview "can you talk to engineers", and I did not know what he meant then either - my phd in engineering is from cambridge and everybody there was super flexible in their thinking!)]

pbreuer commented 5 years ago

Can we have some more debate on this, please? I was hoping the interchanges would be rapid-fire!

There needs to be comment on the technical content. The first thing a referee should do is decide "is this true or not" as that is the easy part. I'll explain further below.

[Yes, one cannot in general short circuit and say "if it is true then would it be relevant" because that misses out the duty of digging out what is there that the author does not appreciate is important for this audience and encouraging them to emphasize and develop that perspective, but never mind that line.]

My guess from the interchange so far is that (1) the technical content may be beyond your comfort zone and therefore "I have read it" means "my eyes passed over it but I skipped understanding it because it all looks like math and I do not relate to that level of abstraction". If so, and I am only going by my own experience that "don't say means can't say". what would be valuable would be the first point at which "the brain lost contact with the page", and why. I have been through the text with a fine toothcomb and I cannot imagine why, unless the reader is simply someone for whom all that kind of thing is alien. Like me reading a paper on quantum computing! (which I sometimes have the bad luck to get to referee).

My guess, on absolutely no evidence again apart from having refereed one paper here myself and picking up on a sentence or two, would be that the referees may be from the applied, practical software engineering end of the formal methods scale, not the fundamental research end.

So your idea may be (2) that certification is a kind of black-box business-related end-process, not an open-ended white-box process that convinces the author of the software first of what it does and that it really does do it, and has the same effect on other later kibbitzers, which is my concern as a creator of open source software. Maybe in twenty to thirty years one may build this kind of reasoning into a tool or other application, but I am only concerned with inventing the reasoning system and using it in an ad-hoc manner to first of all specifiy what my software (compiler) should do, and convincing myself and others with access to the source (everyone!) that it does do it.

So what I am looking for and not getting so far is criticism of the reasoning, and if you OK it that will be "certification" and if you fail it it will be "not certified".

You will also OK or not OK the whole system of reasoning.

Remember that I have no clue when writing the paper what you will be looking for in it. I can only report what I have intended to do, have done, and that it is successful for my purposes.

I infer from your responses that you are sited in a perspective of existing software engineering certification processes, which would be about as alien as the planet Mars to me. So your question would be "can I (i.e., you) use what is described here in my practice". That is not a question I am even interested in, let alone ever wished to pose or answer (not!). I am only interested in my practice, which is creating open source software, and in solving the problem I had, which is

(A) I did not know what the goal was (solved by specifying it in terms of entropy) (B) I did not know how to get there (solved by the calculus of trace entropy, which makes requirements of every compiler construction). (C) I did not know if I had done it (solved by checking that every compiler construction produces maximum entropy per instruction-that-writes that it inserts into the generated code, and "using the theorem")

So I certify that I now know what my software is supposed to do and that my software does what it is supposed to. If you disagree, then you will poke at the reasoning and find a flaw in it.

To the larger criticism that this is not yet any kind of formal method that has been integrated into a tool that is applicable to software in general, yes, right, it is not meant to be. Maybe in another century. Nobody says the conference is about that. The conference is about getting confidence that software does what it is supposed to do, and convincing people of it, and making sure that that sort of thing is closely integrated into open software development.

Which is precisely what I expect to have done, and I would like comment on it!

The question a referee must ask is

  could I/we have done this

(after checking it is true, of course!). If the answer is no, then it is new and original and advances the state of the art. If the answer is yes, then one points at the required work AND carefully explains how it may be applied for the same result.

[In this as well as in other places I sometimes get the impression of having invented perpetual motion and turning up to an industrial fair on battery design and production, and being told to go away because that is not relevant! Everyone thinks their conference is about what they do, not about what the conference really means to be about!]

nevrenato commented 5 years ago

Dear Peter,

as you can see in my last reply, I shared my concerns with the PC chairs and asked them for feedback on my comments regarding OpenCert's scope (who more qualified than them to clarify this?).

If they say that I'm interpreting OpenCert's cfp incorrectly, and that it suffices that the paper is written in the context of an open project, then of course I'll happily say that your paper is in scope.

In the meanwhile:

In what way is this not exactly integrating a formal method technology and tools (indeed, inventing one, the trace calculus as given) within an OSS project (the platform and compiler toolchain for encrypted computing)?

I never said that this was not the case. What I said is that Open Community & Cert. should be the central theme of the paper, or at least a major part of it. OpenCert indeed proposes the thematic aspect of FM. But the latter should still be centred around Open Community & Cert, IMO.

My feeling is that behind the words is really meant "err, this is all new and different stuff and we don't recognize it"

No, what I meant was what I said above. Needless to say, if for any reason you think I'm not qualified for reviewing your paper, please let the PC chairs know.

Cheers

paddykrishnan commented 5 years ago

The first two sections are directly relevant to the conference. The authors explain their experience with the conflicting requirements of double-blind reviewing, open-sourced materials and potential plagiarism.

The rest of the paper is technical and describes the behaviour of a stochastic compiler. The aim of the compiler is to produce a program whose behaviour differs from the behaviour of the normally compiled program at various intermediate points. But the "big-step semantics" of the program is maintained. The use of such a compiler is explained in section 6 where an observer cannot learn potentially secret values as they have all been offset randomly.

In Section 4.3, on page 8, it will be good to show an example why not using "restrict" will result in slow code. Intuitively, any address dereferencing will involve only one correction involving the \Delta. Perhaps rules similar to 13 and 14 on page 9 will help. Also it will be useful to have the rules for compilation of the branching statement expressed using the instructions in Table 1. This would be similar to rules 8a and 10a.

I that the authors add a running example (the Ackermann function example will suffice) to the paper; that will make some of the technical material easier to follow. I did not have the time to understand all the propositions, lemmas and theorem along with their proofs; but, intuitively, they appear to be correct.

In Section 5, the url http://anonymised.url needs to have the actual URL.

In summary, the paper uses entropy to achive security. It is the compiler that introduces the entropy. The paper definitely makes an interesting technical contribution. I leave the decision of whether such a paper is within the scope of the workshop to the chairs.

pbreuer commented 5 years ago

Hi Renato - yes, it is good to seek advice from the Chair rather than tell him/them what to do (which is the bizarre situation I seem to have been encountering in security conferences, where the Chair seems to be a kind of administrative stooge for greater powers on the committee :-). Antonio should know, because he at least would know what certification IS, whereas I know I don't know - I know what it is in the same sense I know what taxes are; that is, something other people know more about and which give me the heebie-jeebies. Yet I have had work published in this conference in the past, so perhaps Antonio's view is as uncertain and flexible as mine. The dictionary says "2: confirmation that some fact or statement is true through the use of documentary evidence" (I skipped 1, because that said "1. the act of certifying ...").

It may be silly solipsism to rely on dictionary semantics, but their definition puts documentation to the fore, which I hadn't really thought about.

But it is "evidence" that it says must be documented, and I suppose that's the source code.

Does it mean the reasoning based on the evidence must also be documented? I would say yes, but the dictionary is not so insistent. It says " 4: validating the authenticity of something or someone" and "validation" can be done by physically checking the seal, which itself does not require documentation. I think only a repeatable and verifiable means of checking the chain of authentication is required. Documentation of the reasoning as such may be old-fashioned when we can recheck the facts anytime.

On that logic it seems indisputable to me that I am certifying that the compiler does produce code that varies to the maximal extent possible from one compilation to the next of the same source code. I won't belabour the why again.

Your point as to whether certification is the MAIN thing in the PAPER is a thorny one because I never intended it to be any part of anything in that document, and I cannot say I have written about it at all by name. If anything, I considered what I had done to self-evidently be certification in the OSS context.

As a pound of flesh, I could replace the page of "writing" that is there with a discourse as above, and maybe another page. Is it worth it? My intention was not to offer teaching but research and I really hate nebulous philosophical justifications about the nature of this or that.

And anyway, why aren't you complaining about the absence of open source software as a MAIN thing!

It's all open source software. Again, it's something I took for granted, never having known any other enviroment and assuming it. I assumed everyone would know that the difficulty is in imposing any sort of form or direction on an OSS project, since it kind of "just grows", so the best one can do is come up with principles and get people to stick to them.

That is what I have intentionally done. That's what I intended to be refereed.

As to whether you know enough to read the paper, telling me that there is something you don't know, don't understand, or have no idea about would convince me that you do know enough, because you know what you don't know and can say so. Do you have anything for me in that line?

The question I will put to you is:

1) How do you even express that a compiler should be as chaotic as possible? 2) How do you convince yourself as author or anyone else that it is so?

Can you answer those questions?

I do not believe you or anyone could, until I/we expressed it in a formal way, and produced the simple axiomatic principle that should be followed in authoring of the code (every machine code instruction that writes should vary to the maximal extent possible) and the reasoning apparatus that allows one to jump from verifying that at the level of individual compiler constructions to the conclusion.

That is what I intended to have refereed, not questions about whether this is certification or not, which it self-evidently is in my logic, or whether I have written about certification, which I self-evidently have not. Describing a trip in a car is not writing about cars. The isolation. clear enunciation and documentation of basic principles are in my view the only way of proceeding to obtain correctly working code with open source software, which - recall - changes all the time in a really vital way and at a great pace and will be corrected later, and that is what I/we have done.

I pointed out that I alone wrote >1M of finalized haskell code in december and january. Principles are all that one can refer to as constants during the writing (haskell code is probably 10x as dense as C, so that is 10M LOC in C, but people tend to write at a constant number of LOC no matter the language; anyway, there were at least 4 snapshots in that measure, so presumably the real total is more like 250K LOC).

Now I can at least certify that I know what the code is supposed to do, and that there is a way of checking that it does it, if anyone bothers to look.

Are you, or anyone, going to look? It's not pleasant trying to figure out compiler constructs. Here is a fragment from the compiler code for assignments of atomic types (ints, chars, etc)

    ks <- rollIntDice sze
    mc <- sequence [
           do
           let k = ks !! i
               o = os !! i
           -- set offset of r for moveto to read updatedb r (Offset k)
           mc1 <- addc' (t0+++i) (t0+++i) zer (e(k - o))
           mc2 <- comment' $ "just set val  of " ++ show i
                           ++ "th entry for " ++ show x ++ " in "
                           ++ show(t0+++i) ++ " offset " ++ show k

The "addc'" is a call to emit an ADDC machine code instruction (through various layers of register allocation abstraction). The "k" is a random input to the constant in the instruction (the "e(k-o)"). It comes from the "ks" which are a bunch of random numbers all generated at once here (the "rollIntDice").

All anyone has to do is check all the compiler code to confirm that every instruction-that-writes that is emitted is varied in every constant in the instruction by a random input, like that.

More or less (the provisos are in the paper, and figuring those out is another mountain that was climbed successfully, leading to the obfuscation theory result).

So ... do you really CARE if the word certification is mentioned? (it is not). If so, why? If you'd like it put in, you can ask and it may happen! I just happen to think it is kind of silly, because this is open source software, and all I can do is certify we know (now) what it should do and we kind of could certify it of a snapshot, but won't, because it changes all the time and we may have made mistakes, but that doesn't matter because the idea is that the Universe of Kibbitzers will fix them later (and put in more mistakes).

I think the essential question for a referee is "have we got the reasoning right", on which I am hoping for comment. It has taken 9 years to get this far (hey, I had to write the processor too, and the operating system - compilers are just the third component of a vital triad of processor, instruction architecture, and compiler that combine to produce the required security properties). Do you understand the scale of the undertaking?

Technical discussion is what I am looking for.

If you want to say, "oh well, we suppose you have got that right", please go ahead and say it! If not, not.

For example, I would be worried about if then else constructs. One can see which way they branch (instruction-that-writes means writes a visible thing). So the compiler should vary the way they branch. Does it? I'm not sure. The result of the boolean test is varied (by adding 1 mod 2 or 0 mod 2, randomly, and compensating at a higher level). But the kind of test (the opcode) is visible. If I could I would make the opcode invisible, but it seems to me it partially undoes the variation. I was worried enough to additionally interchange then and else cases randomly, as well as vary the test between < and >=, and swap sides on the comparants.

Then there are questions about what information one can get by replaying functions with different arguments. In the final application of this compiler as an instrument of theory, that doesn't matter because everything is unrolled to an exponential depth (leaving polynomial methods of querying it in the dust) so there are no functions or loops or joins of if-then-else branches that can be seen, but as a practical matter it may make a difference on a given processor (i.e., not one whose word length n is variable and can be taken towards infinity for the purposes of the argument).

There are lots of questions like that. I would naturally ask "can't all this variation you have introduced be undone by some thinking about it? After all, these are only affine transforms, and those are invertible (if not degenerate)".

That's the kind of thought-provoking argument I was hoping for, but it's not really possible, because /I/ am only learning about the situation, so why would you have more insight than me who has thought about it a lot and still only knows a few of the answers?

[The answer is that the logic in the paper shows that user input to the trace is masked completely by the compiler input, as a matter of statistics. Whether that answers the question above is another question of a philosophical nature. I only put up facts that can be decided, I hope. What they mean is something else and I generally do not wish to opine. An intuitive answer is that there are more degrees of freedom than you naively think, and the formal logic proves it.]

luis-sb commented 5 years ago

In principle, this paper could be framed within the topic "open community research in formal methods and software engineering” (as enumerated in the workshop website). A paper in such a category is expected to focus on the process (i.e. on how open research in a FM field was conducted, its difficulties, challenges, ways-through) rather than in the product of the research itself. Unfortunately, this does not seem to be the case here: most of the paper is concentrated on the product side; only section 2 somehow relates to the open research process. But even section 2 (one page length) is mainly concerned with the difficulties the authors experienced in having this kind of research recognised in some venues, rather than with explaining how it was actually conducted. The remaining of the paper is basically on Formal methods “tout court” and therefore a bit out of the scope of the workshop. As a FM piece of research, it has certainly merit, which however I refrain from assessing at the present stage (just a note for the authors: the statement of formal results, as lemmas or theorems, must be accompanied by the corresponding proofs, or at least by a pointer to another document where they could be found).

My suggestion to authors is to re-structure their submission, reducing the focus on the calculus (the research product) and giving more insight; data; assessment; lessons learned, etc on the open research process.

luis

pbreuer commented 5 years ago

Hello Paddy ... thank you for going through the paper. I will believe you are right in what you say even without reading it, but let me just do that:

in Section 4.3, on page 8, it will be good to show an example why not using "restrict" will result in slow code.

Well, it will (result in slow code) and the paper should say that (but I don't see it, so maybe the stuff about accepting an "engineering compreomise" got cut), just not as slow as if restrict were not there. The compiler construction is a piece of code that at runtime takes log2 N steps to access an array element, the array being size N.

If restrict is not used, then the "array" of possible addresses is all memory, which is N=2^32, so the code will take 32 steps to access an array element.

Actually, the problem is more the code size, but never mind.

Your belief is understandable, because ... it looks like I dropped treatment of addressing from the paper for space considerations. I support you in your natural belief ...

Intuitively, any address dereferencing will involve only one correction involving the \Delta.

... but you are forgetting that the compiler does not know WHICH element of an array (or all of memory) is going to be accessed at runtime, so does not know WHICH correction to apply for the content of that entry. One has to wait to runtime to discover it, but one only has the correction scheme available at compile time.

The solution to this problem is a code construction at compile time that caters for all possibilities at runtime, with the slowdown as explained above.

The different correction per array element is mandated by logic about "every instruction that writes must vary to the maximal extent possible", but I don't know if I put it in ... I did, when discussing how the principles enunciated drive compiler design at the end of section 4:

to the question of whether an array should have (a) one delta common to the whole array or (b) one per entry, the answer is (b) one per entry. One per array would mean each write to an entry must be followed by a `write storm' to all other entries too, to realign their deltas to the newly written entry's (which is changed because the write instruction must add entropy to the trace). But the write storm's write instructions import no entropy as their deltas are all the same as the first, contradicting the characterization [in thm 1 of what a chaotic compiler must do].

Do you prefer your "correction" to my "delta"? If so, please say.

Perhaps rules similar to 13 and 14 on page 9 will help.

13 and 14 are rules about what the corrections are ... they must be the same at the end of the two branches of a conditional just before they join again, and the same at beginning and end of a loop. Otherwise things will go crazy at runtime.

Are you asking for rules about address dereferencing? You have a point(er) (:-) but I carefully went through the paper removing rules, explanations and discussion about addressing in order to make the paper fit in the space available. Obviously you have noticed the lack and are filling the space with your own guesses and so some words must go back in. I can promise to do that.

It could be tricky to explain however, because the address of each array element also must be "corrected" individually, as it is a visible outcome in a write instruction. We probably did not want to get into that. The arrays are not contiguous, and the places their elements are vary during the program runtime. There is a separate Delta vector for addresses that changes during program execution. The nominal values of the addresses all fall in zones nominated by "restrict", so there is another reason for having that word around - the compiler would otherwise have to use a vector 2^32 entries long while calculating. That's not a big deal in reality, but I don't have an extra GB on my laptop to be able to develop for it.

Our main aim was to show how the calculus is derived, not what its details are - that's my excuse and I am sticking to it.

Also it will be useful to have the rules for compilation of the branching statement expressed using the instructions in Table 1. This would be similar to rules 8a and 10a.

Yes, that is no problem to write out (I have it somewhere), but it is really boring. The essential thing in these constructions is to make sure that the compiler issues no more than one additional "instruction-that-writes" with respect to the component parts of the construction, and varies it independently of them. Or you can have two, and then have to vary both independently, etc.

The if-then-else is not good as an example because it introduces "trailer" instructions at the end of the two branches that even out the corrections between them. Those trailers do not as a group carry the full entropy that would be expected for their total as individuals, because they are constrained by "evening out". That is part of what the later theory quantifies.

But we can do this. Do you still think it is a good idea given the explanation above?

I that the authors add a running example (the Ackermann function example will suffice) to the paper; that will make some of the technical material easier to follow.

You want everything to be accompanied by a low level example .. I don't know there is space but it can be tried. The ackermann function is neither good nor bad here. The problem is that assembler code is simply incomprehensible to look at as well as lengthy so there is really little chance of locating an example that can be related to a comprehensible part of the source code and doesn't take up a page. But I'll have a think.

I did not have the time to understand all the propositions, lemmas and theorem along with their proofs; but, intuitively, they appear to be correct.

Intuition is not always a friend.

In Section 5, the url http://anonymised.url needs to have the actual URL.

Thank you.

In summary, the paper uses entropy to achive security. It is the compiler that introduces the entropy. The paper definitely makes an interesting technical contribution. I leave the decision of whether such a paper is within the scope of the workshop to the chairs.

Presumably you are all pushing to include the justifications I have given in the conversation so far about certification in an open source environment being all about the authors being able to justify that the code (which changes constantly) does what it is supposed to do, and being able to express what that goal is. I have no objection to doing that, but writing two pages of different words over an hour or so to replace what is there does not change what the thing is, which took years to accomplish, so I do not see that there is more than a political objective in that.

Thank you again.

[The paper if accepted is a citable reference only to the extent that its technical contribution has been "certified" by appropriate refereeing, so that is why I am pushing for stricter technical examination and never mind the words.]

[I think your remarks above wrt addressing indicate that you did not at that point realise the magnitude of the problem, and how difficult it is. Yes, pointers work, even though they need "correction", and pointers to pointers, etc, and they mesh with array-based access if you choose to access your arrays via pointers and vice versa. It is in the first place extremely difficult to get a compiler to work correctly, for some notion of correct, as various FM correct compiler projects have demonstrated by their input of effort. It is even harder to make a compiler work incorrectly, correctly, and specifying what it should do - the nature of the incorrectness - is in itself a major step (you rightly called the top-level explanation I gave a "big step" operational semantics). But it is essential that the compiler works "incorrectly correctly", and a mathematical approach is essential to that. Inside an open source project that is a major "innovation". The math has to be there for people to follow in order that the thing works. A compiler is not like an ordinary piece of software in which you can get things almost right. It is right and everything will work or it is wrong and nothing will work. It is "critical" code in that way. One cannot guess and be right. Your "intuition" is a guess and one cannot afford that (that that intuition was not on target is not the issue). The point is that there must be no guesses. I really think that "compiler for all of ansi C" (aside from longjmp/setjmp which I have never had time to think about) is going to be beyond all your experiences unless you have tried it, and doing it wrong, right, has not a chance of working without a mathematical plan. I keep on repeating the numbers, but they never seem to make an impact - I personally wrote >1M of code in two months to get it right, having developed the theory that showed what to do in October. That followed two years of investigative compiler development that resulted in a compiler that works, but does inadequate things according to the theory. Still, that compiler showed that integer progamming was possible, and 1st order arrays. In essence, it followed more along the lines of the intuition you expressed above ... which cannot be. ]

pbreuer commented 5 years ago

hhHello Luis -

A paper in such a category is expected to focus on the process (i.e. on how open research in a FM field was conducted, its difficulties, challenges, ways-through) rather than in the product of the research itself.

That's a nice grammatical construction, but I am not going to fall for it. Expected by WHO? Not by me and it does not say so in the conference blurb, which I have quoted from at some point above.

Perhaps you could point to where it says that! Perhaps it does, perhaps it doesn't. I contented myself with finding something that covered what I wanted to write and looked no further.

But if you do want me to write a nice anodyne but ultimately meaningless lot of platitudious, gibberish about an engineering process, I'll probably refuse. I like meaningful papers. I skip the words and look at the math.

If you mean "expected by you", please say so. That is understandable. . But with respect to the implication that there is a stiff, silent community looking on from above whose preference must be obeyed when it comes to the "style and drift and content" of papers, even though it is unspoken, in the first place I know perfectly well that is untrue, and secondly that is deadly to science.

It may well be that you think there is a market for this conference among software engineers who expect yattering about process, and that would be a valid point.

But absent such points of real-politik, conferences should not be about what paper can best conform to what is silently expected of them by the Chair or others. It is a simple question of specifying what is wanted in the conference description and sticking to it. It says in headline mode:

The new scope of the workshop aims to promote the use of Open Community approaches in Education and Research while also exploiting them to achieve wide diffusion and proper assessment of new, innovative Technology.

And here you have the chance to fulfil that aim. One "open community approach .. in Research" coming in to be promoted! It is to find ways of making a compiler work incorrectly, correctly, in an OSS project, by generating underlying principles and theory that guarantees it, never mind code details.

While the blurb says "The workshop general focus is on (blah blah)" it also says quite specifically:

In the context of FM, OpenCERT 2019 will propose thematic aspects related to the area of Formal Methods ... Research: open community research in Formal Methods, ... Technology integrating Formal Methods technologies and tools within OSS projects...;

And

but still accepting submissions outside these aspects.

If your claim is that "thematic aspects" means "the software engineering process" rather than the product, I must confess that it means nothing at all to me in English. Is that a translation of something? It conveys nothing to an English speaker, I think, unless another one would care to correct me on that. Could it be intended as "related themes"?

At any rate, I see nothing about process versus product, or equivalent. If it was intended then it has not come through well.

I also think you do not appreciate the scale and difficulty of this work. I repeat my question:

How do you specify what a chaotic compiler should do?
How do you certify that it does it?
How do you do that in the context of an open source project, where code production is not governed by strict rules and changes constantly?

The answer is to provide principles and theory that says that if those principles are followed at the level of individual compiler constructions, then whatever it produces will, over a large number of such compilations, have the statistical properties required. That devolves the work to individuals in an OSS project who can follow those precepts and guarantees that the whole will work when put together.

I would gladly take constructive insights and advice on how to phrase that.

Perhaps you would offer for that:

"Oh, I see! You have a limit of object codes over a directed set consisting of a source program structure. The object codes have a certain property that transfers to the limit. The property is that every machine code instruction that writes must vary uniformly and independently to the maximum extent possible. Or possibly that the traces produced by it at runtime must so vary. Or both. If you delegate the design of each code to a different individual in an OSS project and they manage their piece of that to have the property, then the finished whole has it, and your certification that it does is valid, although you have undertaken no holistic review of the finished code. Moreover the property persists despite changes in the code over time, so long as those principles continue to be observed".

It's probably a Kan-limit, since the partial compilations would also have the property.

That would be a useful constructive, insight and might work well as a suggestion that I could put in as a useful perspective for others (but I am just guessing at it - there is a priori no infinite process here to make an infinite limit out of, but I suppose one could throw loop unrolling into the mix and obtain one; yes, that works).

To the idea that I should state "there is a theory", "there is a specification", "there is a way of deriving the semantics and logic of the specification methodology", without explaining what those things are but merely abstracting some insights about it having been done in a OSS context and how this works as a modus operandi, etc., I can only say that it sounds ridiculous.

I'm actually also fine with the paper being about "the product not the process", because the product IS the process in OSS projects.

It seems to me you may not be as deeply familiar with OSS projects as I am or you would appreciate that more!

But your point is that I have not written about those process-oriented abstractions that I have just made, but why should I have? I cannot guess what you may want, and if you had said you wanted that, I would not have submitted a paper that contained technicalities (and not submitted one at all ... except that I would have written one to keep Antonio happy!). But the time to tell me you wanted something else was earlier in the process, not NOW. You have had the paper since the first deadline. Time enough and more to comment.

What is the politics here? Is there a worry that you may be stepping on the toes of FMWC? If so, that would be a legitimate concern (where the grandstanding above is not, both yours and mine). I would happily make up some two or three pages of writings of the "nonsense, gibberish software engineering" kind to make that happen if that is the concern, as that would be a real problem that needs circumventing. It is no trouble for me. I can hide the technicalities among words so it does not look like math, using small side-boxes to take occasional formulae. The question is if you really want a conference full of platitudinous rubbish with occasional soft and fuzzy points of mild interest embedded within.

sksowe commented 5 years ago

Hi guys, I guess most of what i wanted to say has already been discussed here. In a nutshell i really find it difficult to discern the contribution of the paper to the workshop. I think some parts of section 2 (as suggested by Paddy and Luis) would have been useful for the workshop audience, and in that regard i would suggest for the authors to trim the paper and just focus on discussing their experience in the research project (http://sf.net/p/obfusc?) that "has been conducted in public view since inception (about 2012)". I could not read the criticism and analysis in the weblog because the link is broken (https://ptbcs.blogspot.com/p/)

SKS

pbreuer commented 5 years ago

The links in the paper are https://ptbcs.blogspot.com/p/wall-of-shame-0-eurosys.html and https://ptbcs.blogspot.com/p/wall-of-shame-vi-dsd-2017.html (apologies if I am guessing incorrectly what is meant by "link").

Folks - I do not understand why you think there is no contribution to the workshop, and you will NOT SAY, so I have no chance of understanding. I have said what it is:

a report of the formal method created and adopted in this open source project in order to be able to make the code being produced work as it is intended to and express what that intention is - which means I have certified that it does do so, in my understanding.

How it does so is what should be OF INTEREST to you.

Why not has met with no explanation. There are only bland assertions without rationale.

Though certification is none of my interest, it appears clear to me that the methodology should be of interest to you as the only way that I can see to build assurances that the ("living") code does what it is supposed to do into open source software projects, which I like to think I am an expert in. This is why:

The problem is that people in an OSS project do it because they like doing it. They will not do something they do not like doing, so if you are going to get some sort of certification into OSS you have to make it wanted and likable as well as just able to survive the constantly changing code environment.

The solution (HOW) is:

1) Make correct working the objective by definition. This is a compiler. There is no alternative to "correct". Here there is no partial success - it either all works or it all fails. The precise semantics are required for it to be valuable or else a menace in terms of security properties.

2) Remove the big-step burden. I did that by developing the mathematical theory that says if you all get THIS right in your own little pieces of code, then the whole thing will be right.

3) Make the mathematics helpful to little-step coding. The mathematics says the design is right or wrong, deciding such things as I talked about to Paddy, whether the entries in an array should lie at predictable addresses or not (not). That forces what is coded, and then the "little step math" states that the code written has the right stochastic property. It makes it easy not hard to get right. Every instruction emitted that writes something that can be observed must vary to the maximal extent possible. Nothing else hides the original programmer's intent, statistically. That's easy to check by eye and the math says it is all that is required. I did it for one example for Paddy. (If people can suggest how to make it type-theoretically impossible to avoid getting that right, that would be interesting .. something like declaring the constants in instructions to be of a type RAND for which only certain constructions exist).

That is what I set out and it seems to me absolutely clearly an example of how to get FM into a OSS project and get an assurance that the code is correct. Yes, a mathematical tool is a tool.

I have not written one word about certification because I do not want to write about certification. I decided that I did not. I decided it should be up to you to draw lessons from the story I have set out, and write about it in those terms if you wish to.

But this most certainly is about getting FM into an OSS project, which the workshop says it is about.

So kindly answer with argument, not assertion. I can sympathise with people who think that the workshop is about certification, or about such and such, but please state that in your argument ("I think the workshop is about blah") and do not assert or assume it as a fact.

Suggestions like "just focus on discussing .. experience in the research project" sound to me like "write meaningless nothings about nothing". I have stated above what insights in my opinion you might take away from examining the report of my research - just on the spur of the moment - but to me those are "meaningless nothings" that I will not write. I am proud of the research done and will not write about a tiny aspect of it as though that were what was of importance. It is not.

pbreuer commented 5 years ago

I meant to summarise the posts above, with a line or two for each, but I have not had time even to COUNT or READ some of them, I am so swamped with email from security cameras, other projects' emails, other deadlines, error messages from remote batch compilation and synthesis jobs and so on to the tune of hundreds 24h a day. Apologies (and for anything else too!).

Failing that I'll just add a couple of observations about this reviewing process, also relying on my experience on other PCs. Going through post by post would be more effective, but ...

Different cliques have different notions of what is research. I knew it about hardware engineers, who seem to have no concept of abstraction, but I now see that software engineers have different understandings of the word "research" from computer scientists and mathematicians. I should have known, or known that i did know that but had forgotten. To a SE there is no such thing as something that cannot be programmed and the question is HOW. To a CS it must be proved that it can (or cannot be) programmed. To a mathematician it must be proved that something with that specification exists (or does not). A SE does not seem to understand that programming is research to prove that what is programmed (or abandoned) is possible (or impossible). It was not known beforehand, or it would not have been worth trying. Similarly a CS can see no worth in mere development, which is what SE research looks to him/her. And a mathematician can see no point in mere "examples", which is what CS research looks like to that him/her. Of course, a SE can see nothing in the theorems of a mathematician, which are intended to give insight, because insight is the last thing he/she wants. He/she wants to do it without understanding anything (yes, I have taught engineering classes, and I have a cambridge PhD in engineering, so it is alright for me to say these things).
It is a disaster having those different cliques on the same research committee because they simply do not understand what the others (and the papers) are talking about, or why. The end product is winning by numbers of votes, and one cannot vote for truth. You just get mainstream incremental, safe stuff that everybody can agree looks the right stuff. I would submit from experience of committtees that the paper that receives just one vote is likely the world beater.
If you have just one clique on the committee, you get a disaster of a different sort - the standard academic cliche of nonsense inbred garbage (I once published a good journal paper on a process model of software maintenance that had it as a pinball machine! It was meant as a joke! Apparently the paper having no sense in it was not an impediment to referees who also had no sense of the ridiculous - or they secretly shared the joke that one can write any old rubbish and have it accepted if it looks the right shape, because nobody understand this rubbish anyway, and nobody reads it).
On a committee with different talents, if none of them are multiple talents, you get the three blind men parable I outlined in 2. (One feels the leg and says an elepahnt is like a tree, the other feels the tail and says the elephant is like a hairy snake, ... each judges the whole that they cannot see by th fragment they notice, unaware that a bigger perspective exists). WIthout the experience of having been totally and completely wrong about everything in a different area over and over again, some people have grown overconfident in their own abilities and extend their estimate of their own worth to areas in which they are completely wrong. It is called the Dunning Kruger syndrome (google!). Others who are more circumspect instead decide to contribute by acting as gatekeepers, checking papers for form, spelling, overall goals and whether they are compatible with a "mission" for the venue [that is a kind of semantics-free burocracy which should be completely out of place in any scientific forum - basically, if you can't understand it you do not get to judge it. Unfortunately many people like to kid themselves that they do, when they don't! C.f. D-K syndrome.] In many ways this parallels open source projects, where people who can code, code, and others supply documentation or bug reports, in order to feel they are dong their bit according to their ability. The outcome is a a committee outcome, in which seemingly everyone assumes that what they themselves don't know is matched by expert knowledge by somebody else, so they ought to agree with the group outcome. But "nobody yet knows about it" is the standard for "real", non-incremental research, and the group outcome then is likely as not a group fantasy based on hocus pocus because nobody knows and nobody will say "I don't know".
So what is the answer? Here we have had the opportunity to remove misunderstandings, but the interface doesn't allow me to "talk" properly. I want email. Yes, I am getting email, but I do not know how to reply to it so it ends up here, and right now I am being swamped by other emails. Even figuring out how to get emailed when there is a forum posting was nonobvious to me, and I do not know how to refine the selectivity down to just this one "thread". Also the "many people speak at once" nature of the interaction is a problem. I want sombody to post a general point and then be able to open a one-on-one conversation with them about it, not be interrupted by 10 other things coming in all at once, after having had days of silence (no, I don't use github ordinarily I avoid it and git like the plague). People just posting their opinion is also a problem! I want them to ASK for other people's ideas on this or that, and work out a joint position. Instead we get unchanging and unchanged points. Also the problem that some seem unaware that OPINION is a no-no in refereeing is a metaproblem. One must justify everything with reasoning. "I don't see how [this relates]" is a personal problem, like a "bug report" (not) that starts "I can't ...". "I think that the requirement is for a paper that attacks problems that software engineers experience in a formal methods environment, especially in finding ways to stamp things as passed/not passed" is a reasoned way of putting it, and one can tell the difference because it is FALSIFIABLE. Also I think the Chair(s) should take the leading role in each discussion, directing and putting a foot down where required, and sparking discussion where it is needed. That is analogous to the project leader's role in an open source project and it has been notably absent practice here. We the PC are the Chair's advisors and it is the Chair's job to get good advice out of the PC, and that means leading and directing the discussion, not standing off.
WRT This paper, I noticed one review with technical content (Paddy's) before I got swamped with other things. I presume (lacking further devotion to the task of reading through on this interface or trawling back thousands on my mail spool) that was the only one? If so, then everybody else cannot deal with the computer science and information theory in it, and their judgement must be of the "gatekeeper" kind in my opinion. [this is based on my long time observation that "doesn't = can't", and that I know that if I were capable of commenting technically on something I would, even being eager to show I could]. Luis is excepted in that because he probably can cope with the innards, and probably is exercising a chair function in deliberately staying out of the details. I also guess he has a market for the conference in mind that consists of the software engineers and he is tryingto "sell" FM to them as a certification method that they will like, because certificates are good for business. So there is politics in deciding what papers should be in. I also guess more politics accrues from trying to avoid stepping on the toes of FMWC organisers, and in keeping them happy by selling more positive images to SEs. But those are just my guesses, and I did not take kindly to the inference that there was a "community" that he was mystically in contact with who had pronounced a verdict - as Chair he has a right to put up an opinion and that is that and no appeal. Still, I also guess he doesn't realise the paper is about research programming not software engineering, about doing something that should be impossible, because it implies (if it exists) that "indistinguishability obfuscation" is doable in certain senses, perhaps all senses, and that is a Holy Grail in security. That security people can't understand what this is all about is because they can't do computer science (or hardware engineering, or abstract math). That FM people can't understand is probably because they are aiming to satisfy SEs, not mathematical existence proofs. SEs are completely mystified because they have no idea that there is a difficulty, even an impossibility. And I wrote imaginging that the auidence were computer scientists. All in all, three blind men syndrome is alive and well.

opencert commented 4 years ago

Camera-ready version for the post-proceedings. paper_1.pdf

paddykrishnan commented 4 years ago

All good.