opencert / workshop-2019

Working repository of the "9th International Workshop on Open Community approaches to Education, Research and Technology" *** towards "Open community approaches" CERTification processes *** Porto, Portugal, 8 October 2019, Co-located with FM 2019
0 stars 1 forks source link

Submission 5 (Position/Short) #5

Open opencert opened 5 years ago

opencert commented 5 years ago

Title: Runtime Verification of Linux Kernel Security Module

Abstract: Linux kernel is one of the most used Free/Libre Open Source Software (FLOSS) projects. It is installed on billions of devices all over the world, which process various sensitive, confidential or simply private data. It is crucial to establish and prove its security properties. The paper presents a method to verify the Linux kernel for conformance with an abstract security policy model written in the Event-B specification language. The method is based on the system call tracing and aims at checking that the results of the system call execution do not lead to accesses that violate security policy requirements. As a basis for it, we use an additional Event-B specification of the Linux system call interface that is formally proved to satisfy all the requirements of the security policy model. In order to perform the conformance checks we use it to reproduce intercepted system calls and verify accesses. OpenCERT_2019_paper_5.pdf

RobertoBagnara commented 5 years ago

I will review this submission. Roberto Bagnara

pbreuer commented 5 years ago

I will also look at this submission. Peter Breuer

jff commented 5 years ago

I will review this submission. Joao Ferreira

heron-carvalho commented 5 years ago

I will review this submission. Francisco Heron de Carvalho Junior

opencert commented 5 years ago

Assigned PC members:

jff commented 5 years ago

Dear authors,

many thanks for your submission. Before I go into more details and further questions, I would like to ask if the Event-B specifications of the HIMACF model and the System Call Interface are available. I think it would be important for reviewers to have a look at those, so that we can understand in more detail what properties were specified and proved.

Thanks, João

17451k commented 5 years ago

The HIMACF model is confidential, and only its first level that describes RBAC is publicly available. Event-B specification of the first level can be found here: https://github.com/17451k/base-model

At the moment, there is little information available about the HIMACF model in English. We wrote a paper about it a few months ago, but its reviewer has not yet submitted his comments. But some additional info about the model and its Event-B specification can be found here: http://research.nii.ac.jp/eventb2018/slides/ilya-shchepetkov.pdf

Regarding the Event-B specification of the System Call interface, we have ported a small part of it (that contains only one system call, open()) to the publicly available first part of the model. It's not complete since it describes only one system call, but that should be enough to demonstrate our approach. You can find it in the attached files, along with a picture that illustrates the graph of events of the system call open().

~ Ilya

base-model-with-open.txt base-model-with-open.zip open

pbreuer commented 5 years ago

Yes, I have also read the paper now, and I also would like to see snippets of code, diagrams, and so on in the paper. The claim at the end that a method has been "presented" is otherwise a little on the optimistic side, but it certainly has been mentioned! I would particularly have welcomed some high level spec of what you do. I am a functional programmer, so I would have suggested something like:

testBench = checkTrace (syscall)
       where checkTrace(syscall)    = [eIO == tIO | (eIO,tIO) <- expectedIO ' testIO]
             expectedIO = [True ..]
             testIO     = replay spec trace
                                where trace = (compose sec_mod kernel) syscall

     compose m1 m2 = is ' os    --- stream of i/o pairs (trace)
                where os = m1 is
                          is = m2 os

or something like that, plus a diagram showing the composition (nose to tail) of kernel and security modules.

That specification also makes it clear that there is a thrid man, missing.

It is a driver that drops a new syscall into the kernel when the previous syscall is acked. I think you have it as part of your replay apparatus, but I do not see it as belonging there.

Your description of the replay apparatus (which checks the trace) is really that of a mealy machine constructed from the transtition function of the machine, which you describe informally. I think you really need to describe it more formally so people can be sure what you are talking about. My guess is

replay   state final = True
replay   state event = if all event.guards state
                              then replay state' event.next
                             else False
        where state' = event.transition_fn(state)

But really I think you could manage a spec in one of the specification languages you are using (which I do not know at all). The aim is to give the reader better purchase on what you have done and how you are doing it, through the medium of fewer words and more math and diagrams. The diagram above is great ... not that I know what it means as I prefer code/math myself and diagrams are pretty meaningless to me.

Those are just overall impressions, composed on the back of my having finished reading an hour or so ago. That the alphabet soup is partly your stuff and partly other people's was unclear to me until I read the comments above. Is HIMACF yours, then? I thought there was some 300p of something to be transcribed, so I presumed that whatever you were translating was not yours. How have you access to it if it is not yours and it is proprietary? It is a general criticism that I have not much clue who the actors are in this story from reading the text, and the names bandied about of models and tools mean nothing to me. It would be helpful if you said WHO these groups are who are responsible for what, and what credibility they have and what their interests and motives are. How come there is a proprietary SOMETHING associated with an open source project like Linux? How did that happen, and is it legal?

17451k commented 5 years ago

It would be helpful if you said WHO these groups are who are responsible for what, and what credibility they have and what their interests and motives are.

There are 2 actors in the story. One of them is the “RusBITech” company, that develops Linux distribution called “Astra Linux”. This distribution is for Russian government structures and has advanced security features. There are several security experts working in RusBITech, and one of them is an author of the HIMACF model. RusBITech has implemented the HIMACF model inside its distribution as a proprietary kernel security module. This access control system is similar to SELinux, AppArmor and uses the common security interface (LSM).

The government requires all operating systems that are meant to be used for working with classified documents to be certified. The certification process is based on the Common Criteria

The authors of the paper are working in the ISP RAS - another actor in the story. We are experts in formal verification, and we have worked with RusBITech to help them pass the certification. We have created a formal Event-B specification of the HIMACF model and proved its correctness (as required by Common Criteria), formally verified the correctness of their security module using Frama-C and ACSL (out of the scope of the paper), and tried to demonstrate that their distribution correctly implements the security model.

The latter improves the level of assurance in the implementation and is the topic of the paper. It is not trivial at all, and we think that the method of how it can be achieved may be useful for other people who try to pass the certification - and not only in Russia, since Common Criteria is an international standard.

We will include some of these explanations in the paper, thanks.

How come there is a proprietary SOMETHING associated with an open source project like Linux? How did that happen, and is it legal?

Of course it is - third parties may distribute proprietary drivers in the form of loadable kernel modules.

evdenis commented 5 years ago

Yes, I have also read the paper now, and I also would like to see snippets of code, diagrams, and so on in the paper.

We will add listings with pseudo-code describing the replay algorithm.

plus a diagram showing the composition (nose to tail) of kernel and security modules.

We will try to add such a diagram if we will got enough space. Here is the draft image to show the relation between a security module and the kernel with the citation from the paper to describe it

In Linux, userspace programs work with external resources via the kernel, and make requests for accesses through system calls. When a program executes a system call to, for example, open a file, the kernel performs a number of checks. It verifies the correctness of the passed arguments, checks the possibility of their allocation, and also checks that the program has the permission to obtain the requested resource by the discretionary access control. If more advanced access control mechanisms are implemented, then the kernel should check that they also grant access. Such advanced mechanisms are implemented using the Linux Security Modules (LSM) framework as a security module lsm

That specification also makes it clear that there is a thrid man, missing. It is a driver that drops a new syscall into the kernel when the previous syscall is acked. I think you have it as part of your replay apparatus, but I do not see it as belonging there.

First, we collect the traces from the real OS kernel under tests or fuzzing. Next, we replay the collected traces on the model. The replayer sequentially checks performed system calls on the model. Yes, the replay apparatus drops a new syscall into the model when the previous syscall is acked. We will add diagrams and pseudo-code to make it more clear.

But really I think you could manage a spec in one of the specification languages you are using (which I do not know at all).

The Event-B language is not very suitable for describing sequential algorithms. We will address it with pseudocode.

Thank you for the feedback!

pbreuer commented 5 years ago

There are 2 actors in the story. One of them is the “RusBITech” company, that develops Linux distribution called “Astra Linux”. This distribution is for Russian government structures and has advanced security features. There are several security experts working in RusBITech, and one of them is an author of the HIMACF model. RusBITech has implemented the HIMACF model inside its distribution as a proprietary kernel security module. This access control system is similar to SELinux, AppArmor and uses the common security interface (LSM).

That seems a great explanation that should go in to replace the current stream of acronym-oriented facts! That was very comprehensible narrative, and narrative is what a paper "wants".

The HIMACF was the 300p of tex/math? It would be useful to me if you placed reminders in the text so that I did not have to remember the acronyms ... Instead of "HIMACF" model, say "the 300p of the HIMACF model" a few times, and it might stick with me.

What does HIMACF stand for, by the way? Was it explained at first encounter as per standard practice? Unless that is HI-MAC-F it has no deconstruction in english, and is not memorable.

RusBiTech is a business, in it for profit? But their market is the russian government? Is it a government spinoff, or a startup company created as a result of a bid process? It seems very strange to me to have the russian government as a security customer unless you already are or have been somehow associated with security in the russian government. Can you explain the circumstances further? What is the specific purpose of their security module, and why isn't something that already exists good enough? What are the (so-called?) "advanced" features that the existing framework lacks hooks for? Is it just a case of "not invented here"-ism?

I think you might want to tell the readers at least something about the current linux kernel security framework. Is their some existing abstract desciption of it that you could use in the paper to set the scene? I don't have a clue what it is like myself!

I am afraid a diagram such as you propose is also fairly meaningless to me (but that is my general reaction to SE diagrams). What do the arrows mean? It looks to me like something on the kernel side of the kernel/user interface, but I do not recognize much ... oh, I think it is showing what happens to system calls? So that "process" is user-side. The process CALLs a syscall, which drops into the kernel side, and is then checked for validity in a couple of ways. Those arrows might be calls and the squares might be layers, or objects. What does the validating? All I vaguely recall from writing syscalls myself is putting something in my module to handle it . Do modules have to register their capability to handle a syscall? If so, what is that wrapper? Is that where the form of the syscall arguments is declared, usually, in order to set up the correct validity checking?

So if you want to use that diagram you will have to FORMALLY define what the arrows and boxes mean. And any circles, if you have those!

What is "fuzzing", BTW? ("futzing", I can imagine - that would be a process of fitting, such as adjusting timings).

The Event-B language is not very suitable for describing sequential algorithms. We will address it with pseudocode.

Please DO NOT give sequential algorithms or pseudocode, use abstract, declarative specifications, with no imperative aspect to them, because of the FM context. It is absolutely essential in this context. The ones I gave are declarative but a little too far refined for the context and probably CSP is the appropriate level, but anything that is declarative and high level would do. You want to give specifications like a security module is something that checks a syscall given a certain kernel state (please specify what it does look at):

sec_mod :: syscall -> kstate -> Bool

or perhaps it is something that you have running continuously, and the module is the transition function that modifies the kernel state or local state (remembers something)

sec_mod :: syscall -> kstate -> (Bool,kstate)

in that case as a continuously running mealy machine or service

sec_mod_machine = Stream of syscall -> Stream of Bool with sec_mod_machine = mealy sec_mod init_state

and the mealy construction of the machine from the transition function is standard.

Then the compose function I wrote earlier would just be a kind of parallelization with the sec_mod machine, and the user process would run in parallel with those, communicating using an alphabet of syscalls, receiving back an Ack or Err for each.

Finally .... can you please tell me what the notions of correctness are that you have mentioned. I think that is also essential. I imagine or perhaps I understood from the text that you have built a model of that 300p spec, which perhaps one should call M. Then you have another model of how an implementation in C of a security module works, perhaps call it m. And you claim m =[ M (m refines M). You need to state what your notion of refinement is.

(apologies if I misremember today what the correctness claim in the paper was).

At any rate, both in stating what the components you have built are and how they relate, you need more FORMAL statements, containing at least type signatures and preferably a little semantic information too, in order to attract an audience at FMWC. But perhaps people will put up with a little imperativism in order to get some insight as to how one works with Event-B.

pbreuer commented 5 years ago

I think one should ACCEPT but I have some reservations because of the lack of technical information. If I may so request, I would like you to either

(1) extend it to full length and add much more technical detail
    in the form of abstract specifications (type signatures and data
    type definitions, minimal, NOT imperative) that replace the
    pseudocode given, as well as background information on who is
    doing what and why in this scenario (as you already have
    promised to do in the github interchange); or

(2) keep the short length and STILL do your best as above,
    but just apologise for the lack of technical detail and
    concentrate on the engineering process itself. I'll comment
    below that it looks as though it may be middle-out.

My REASONING is that the subject matter, linux plus security plus formal verification, is a certain draw for the projected audience at FMWC but they will be disappointed by the lack of detail because it is an impediment to understanding. My EVIDENCE for that is I am having to guess what goes on pretty well all the way through. That is not surprising, since apart from the two snippets of pseudocode (which I have not examined, so one can bet there are mistakes in it!), there are words only.

Perhaps I can guess well, perhaps I can't, but I think I am more than capable in the area of the linux kernel in general, although not the security module, and with at least the mathematical end of formal methods. It may be that the paper is aimed more at HOW than WHAT (the informed reader should from WHAT be able to guess how without being told) and engineers rather than theorists? Perhaps you are also trying to stay vague in order to satisfy business and security concerns?

I can understand that reticence, if so, but at least type definitions and signatures and data definitions should be publishable,. There is no secret about the linux kernel security module and explaining THAT in terms that others can build on would be a great attraction -- much more than what you have done with the other "secret" module, I guess. That would be of interest only to security people inside the Russian government ... as well as outside ! "Security people" then!

So (3) give an abstract definition of SOMETHING, preferably the linux kernel fragment you need, including the standard linux security module, so far as you can. That should be black box, of the kind

stream_of_syscalls -> stream_of_acks_or_errs [side-effecting on WHAT state?]

If you can manage abstract descriptions like that of everything, great. I note that you would ordinarily throw in the state by describing the step function:

(syscall,state) -> (ack_or_err,state')

and one of the interesting things to know is if the state' is rolled back to the original state on error from the syscall? Is a syscall atomic, in other words? Is it supposed to be. Does your check make it so?

You might wish to describe your check nondeterministically as

check :: (syscall,local_state) -> (Err,local_state) |(Ack,local_state')

and show how it modifies the kernel transition function that processes syscall usually.

I do also find it difficult to understand how you the authors can be sure that you are verifying something clear and well defined without having defined the targets and the process clearly and well for yourselves, if that is indeed the case. Surely what you are not saying in the paper must be written down somewhere? And it must be simple at top level, otherwise human beings cannot be sure that what they want is being done.

If the answer is NO, it is NOT written down, then that is also of interest to readers because it indicates that you are performing a sort of middle-out verification process, in which you figure out only late on what precisely you have verified.

On that subject, you quite correctly point out that you are verifying only models, not real code, but the expression of that idea was not very clear to me, because I expect to see things like "we define an abstract model M" and "we define a concrete model m" and then define some relation between them. Are those Mealy machines? Minkowski machines?

Perhaps you do know and are keeping it back or it may be that even you do not know those details, because it is hidden in the Event-B code/runtime as to exactly what kind of finite state machine is being constructed for verification, or what two are being constructed, and what the refinement relation is that is being verified. If so, you should SAY.

There is no problem in some things being unknown to you and certainly I will admit that I don't know what Event-B does or how it works (is it derived from the B-tool? In that case I would expect some set theory. Or is it derived from Boerger's abstract state machines? In that case surely you have to define a bisimulation relation, as well as the machines?) and I would like to see some explanation by way of example snippets of Event-B code and spec.

All of that really boils down to a single observation: you say in the paper that you have PRESENTED a method, but really you have MENTIONED it. That should be remedied, one way or another. My preference is that (abstract-level) detail be given. The alternative is to concentrate on engineering process.

Abstract.

Missing the first word ("THE Linux kernel") is never a good start, but never mind!

It is the PRODUCT of the project that is USED, not the PROJECT.

The paper -> This paper

In what sense is a security policy written in Event-B "abstract"? Perhaps you mean "realized as", not "written in".

the system call tracing -> system call tracing the system call execution -> system call execution

(generic, hence no article in English)

results ... lead to ... accesses? You are probably trying to say too many things at once in this sentence. Please change. Are you checking accesses, or capabilities (which govern accesses)? I think you want to stop the sysadmin opening up (via a system call) for access something that they should not.

If so, this is perhaps excessively grand rhetoric.

It is not clear at this point what exactly you are verifying. I thought at this point you meant you modelled lots of syscall stuff in Event-B, which is great, and you believe it is right, and "therefore" what you prove about it you believe is being proved about Linux too.

It turns out, I think, that instead you have a very abstract model of the syscall interface, along the lines I suggested. Please give it as exactly as you can.

Introduction.

P1.

Why "security policy MODEL"? Don't you just mean "security policy"?

I don't know what either is conventionally, or if there are conventional formal and semi-formal meanings, so perhaps your best bet is to DEFINE the term, and say you are defining it.

Care to give me some background on the "Common Criteria standard"? (I don't know what it is!) What is it about, who is it by, and what credence or otherwise does it have?

Additional ... level of assurance? You have haven't mentioned ONE level of assurance yet! So there cannot be an additional one. You must logically describe the ordinary level of assurance first.

Additional -> An additional

(I will not comment often on mistakes of English involving articles, since I understand the language problem involved, and you have clearly made an effort to avoid them).

Yes, just so, as you say, showing that the code satisfies your model is what would make your verification be of linux rather than of your model of linux, so we need to know what your model IS in order to be able to judge anything.

That would seem to be a weakness that is easily fastened onto by critics, and you should forestall it by at least saying what steps you take to try and check that you have the model right. Presumably you could do runtime checking against your model? I would rather you just extended the paper and gave the model.

"We propose" is deadly in an article. It tells the reader you haven't done this, so are we evaluating science or dreams? Have you done it or haven't you?

We propose to -> We

We suggest to -> We (again, wrong tense for a REPORT!)

So you will be writing/have written some stuff in Event-B and compiling it to C that you can insert into real kernel system calls to get a runtime check of what you say is going on?

Hopefully all will become clear soon, because we need an example now :-). I do not know enough at this point to be able to describe to somebody else what you have really been doing, or even if you have done it yet!

P2

Section 2.

It is no good telling me about examples by reference! (Bell-LaPadula, Biba). I appreciate you space limitation but You need a word or two on what they are, or you think they are, or what you think I should think they are.

More or less ditto HIMACF, MROSL DP. What ARE these things? Those sound like meta-models, not models, from your sentence.

What is a "hierarchical" model? Do you mean something like "deeply structured"? Is that a jargon word in your context, perhaps, that you want to trigger a reaction from the reader with? It doesn't work for me.

There's a lot of alphabet soup here. I don't know what RBAC is, or MIC, or MLS, MAC. You need a higher level summary for me! (You have promised on github to name the actors and build more of a story here).

The bit I CAN understand about HIMACF is "the model is written in ... plain text with extensive use of math". OK. That works for me!

But are you sure it is math and not something slightly more formal, like Z, or B, or something close?

Is reference [10] (one of) you? Say so if so. How long does 300 pages take to formalize (correctly? What tests are possible? Do the authors of the spec give any?) I would guess a page every day or two, so maybe one or two person years of effort? Say how long it took.

Section 3.

The writing is much better here, with a natural flow.

P3.

A missing apostrophe (variables').

What is the Rodin platform? Is that Event-B's "tool", or separate, or a subpart of something? Please make it clear.

Telling me that A HIMACF model is like a Event-B specification doesn't help the reader who doesn't know either, but yes, it helps me understand the nature of the work you did.

Section 4.

The normal approach to not having an exact model of state is to produce an abstract machine that checks pairs of inputs and outputs for plausibility in terms of what you think the complicated state model should do. You may not be able to model the output given the input, but you can check if the output(s) makes sense when paired with the input(s). You may not know why the syscall errored on that input, but you can accept that it did. The exact behaviour (that you cannot model) should be a relational subset of the abstract relation (that you can model, and have modelled). Please give some kind of specification of your models.

(a very few A/THE mistakes and verb conjugation mistakes are still there, here).

What is a "graph of events|? Are events the nodes or the edges of the graph? Do you mean something other than a graph of mathematical graph theory? Please specify. How can a variable connect things? Especially things in a graph? (It turns out that you mean a labelled, directed graph of mathematical graph theory, I think).

It seems that your Next variable is a program counter state variable? So that is not a graph but an automaton? You may wish to call it GoTo, alternatively. I would just call it a program counter (PC).

What does "depending on other guards" mean? (answer, nothing, as is!)

Well, I think you have a load of labelled gotos there, plus an (abstract) program counter. What do these gotos do as a side effect? What are the guards like? It seems possible to me that they will be in a decidable fraction of arithmetic (less than comparisons, addition, program variables and metavariables). Is that so?

Anyway, I see what you mean by graph. Each "event" is an EDGE from one labelled node (what the program counter Next has as value) to another. Please remember to specify that. "graph" is not sufficient.

That is a deterministic automaton.

Why don't you just put a red counter on the graph to show the value of Next? That would complete the picture for me, and you could show the picture! A picture really would replace a lot of words here.

P4.

Section 5.

Since when does the kernel verify "correctness" of syscall arguments? I can believe that one can give a type signature to some macro that is used for defining syscalls and get some kind of check for free out of that, but some syscalls have different numbers of different length arguments depending on another argument (AFAIR), so it can't be set up easily, can it. And what does "correctness" mean? You MUST specify that at least!

What is the permissions check apparatus called? I am not really aware of one ... oh, maybe it's all within the LSM framework, which I happily am completely ignorant of.

You really do need to explain what a "security module" is, preferably as an abstract type or similar.

"kernel with the security module"? WHAT security module? Is this a THE/A mistake?

You really want to list the three possible failures clearly. 1. no call to LSM. 2. no LSM call that checks what you need. 3. Badly implemented sec. mod.

Is that what you manage to discover by your method?

Your Event-B spec is more abstract than the kernel, which is correct.

Section 6.

P5. Line coverage? What is that? Do you mean lines of code exercised by some test? What test? And which security module (yes?) are you talking about. I wasn't aware till now that from the text that it was the security module that was being checked! This is very late to tell me!

So you get some traces and then check them against your spec, do you? Please write all this down formally, even if it is a type signature only. Otherwise the reader simply does not know what is going on.

That is normal setup. Usually one has to hook a tested module to a driver to get some real I/O trace from an initial impetus, and then one sends the trace to a tester for checking against the spec. Those 6 things ... tested module, driver, set of impetii, trace, tester, spec ... comprise a test rig.

What is the mapping to the abstract spec state? (the state is the Next variable?).

Your description is too imperative surely? Try something like:

 testBench = compare     --- stream of True, hopefully
   where compare    = [eIO == tIO | (eIO,tIO) <- expectedIO ' testIO
         expectedIO = [True ..]
         testIO     = replay (initial_state,trace)
                         where trace = (compose sec_mod kernel)(syscall)

 compose m1 m2 = is ' os    --- stream of i/o pairs (trace)
            where os = m1 is
                  is = m2 os

I think you are saying

 replay (state,Final) = True

why should a final event have to "change the state"? Nothing happens after!

 replay (state,event) = all(event.guards state) -> replay(state',event.next)
                      | False

but I do not know what state' is. Apparently event has a transition function and

             state' = event.trans(state)

Yes, "check" needs explaining! My model above is that a trace is either the Final event or an event with a next field pointing to more trace.

You seem to be saying that you have a list of syscalls queued up and when one finishes OK you move on to the next, and that does not figure in the model I guessed above.

That would ordinarily require an extra component in the composition of sec_mod and kernel to dribble a new syscall to the kernel when the kernel acks the previous one. Please specify it. You can ask me for my guesses if you like!

Yes, failures need to be investigated, but I think you should be talking about debugging not testing here. It is not clear if the description is sited still within the context of your test apparatus, or whether the context has shifted to the kernel, or something else.

As well as some formal spec here of what is going on, you need diagrams of the setups (I described them above ... "compose" is nose-to-tail join of two I/O machines). For ME, diagrams are NOT wanted, I want "the code", i.e. the abstract mathematical specifications of what is going on, but if you believe diagrams are helpful to practicing engineers who what a HOW-TO, not insight, I will defer to you!

P6.

Section 7

Do you mean the PRESENT authors [27] (of this paper, i.e. you)? Otherwise you should not refer to "the authors" but to them by name (Zanin and Mancini) because of the confusion.

Please don't say "the paper" [15], but name the authors (Guttman et al) to make it clear they are not you or just say [15].

Section 8.

P7.

I suppose you have kind of presented a "method" but with a lack of detail so I can't really say that it is not just what I am guessing.

You do need some semi-formal "mathematical" descriptions like I gave above of your test rig and what it and the kernel does.

It would be nice to see such real bits of your spec and bits of target code (syscall, syscall wrappers) as you can manage to sneak in.

17451k commented 5 years ago

Peter, thank you for such an extensive list of suggestions and comments, they are really valuable to us. We will try to encompass as much of them into the paper as we can before the deadline (August 4).

~ Ilya

heron-carvalho commented 5 years ago

This paper reports an experience of the authors in proving that the implementation of the access control mechanism of a Linux distribution (Astra Linux Special Edition) complies with its formal specification (HIMACF), using refinement techniques associated with the formal Event-B method.

The paper does not contribute with new techniques or methods on the use of formal methods to prove the conformity of a code with a formal specification from which it was derived, nor does it seem to have that purpose. It is rather an experience report. Even so, I think the article may be properly classified in the “Formal Methods” topic of OpenCRT'2019.

Perhaps the lack of relevant scientific contributions in the article is the reason why it was presented as a “short paper” by the authors. However, I think that presenting more details about the verification process, as pointed out by the other reviewers, could give more relevance to the article as an experience report. Therefore, I suggest changing the status of the article from “short paper” to “full paper”, providing additional space for authors to include the additional information required by reviewers. Among the additional information required, we highlight more detailed information on the HIMACF Event-B specification and the System Call Interface, as well as a more detailed description of the differences between HIMACF and the additional Event-B specification obtained through refinement.

In Section 6, the last paragraph makes it unclear whether the analysis of Astra Linux access control mechanisms was completed without errors or divergences, or even completed. Can a summary of the result of this analysis be included in the article ? Or is that information confidential, too ? If it is confidential, I find it frustrating that much of the material, and even important information about the results obtained in the processes reported in the article, are confidential, particularly when it comes to a conference in which the concept of the Open Community is the main theme. If so, I would like to know the opinion of the authors.

jff commented 5 years ago

Having now read the paper and all the comments above, I would like to add the following:

  1. I agree with the other reviewers' comments on describing vs mentioning the method. From what is presented (and further explained in the comments above), I don't think that the work being presented is fully reproducible. It gives an indication of what you have done, but the reader interested in reproducing this work would have to fill in the gaps by guessing what should be done (e.g. how are the test suites that are used to generate traces created? the authors mention the tools used, but there is no information about the test generation process --- and this seems crucial to the quality and accuracy of the method proposed)

  2. Another aspect is related to the evaluation of this work: how good is this method? I couldn't find any details about how the authors evaluated their approach. At some point, the authors write:

    We have partially evaluated the proposed method on the HIMACF model, which integrates several advanced access control mechanisms, and its imple- mentation inside Astra Linux distribution.

    What does "partially evaluated" mean? I would like to see more details about what was really achieved.

  3. The authors seem to have formally verified the models mentioned in the paper. However, the verification that the Linux distribution correctly implements the security model is essentially based on testing: if an error is found, then it most likely is a true positive. However, if no errors are found, it does not mean that the implementation of the security model is correct. From what I understand, there might exist a trace that is not generated by the test suite, but that might occur and cause a violation of the security model. If my understanding is correct, then what is the degree of confidence that you can provide?

  4. I understand that some of the work being done is confidential, but given that i) this is a research venue (where works presented should be reproducible) and ii) this is OpenCERT, which is focused on open source, open content, and open knowledge, I agree with Francisco's suggestion of allowing the extension of the paper, so that more details are shown. I would like to know what the authors think about that.

RobertoBagnara commented 5 years ago

As others have observed in detail, I believe the paper, in its present form, has little to offer. It seems to me that what we have now is more a plan on how to write a proper paper than a short paper: there really is no substance (particularly striking is the complete lack of examples) for the reader to grasp. I think even a person working on the same topic would have a hard time connecting the dots.

pbreuer commented 5 years ago

WRT 'What does "partially evaluated" mean' (JFF, up above by one), actually, "partial evaluation" is a thing https://en.wikipedia.org/wiki/Partial_evaluation (but 'ol Wolfram is better: http://mathworld.wolfram.com/PartialEvaluation.html : " Partial evaluation is a branch of computer science studying program transformation via specialization. Any function can be specialized by fixing one or more of its inputs to a particular values. The number of inputs to the program resulting from specialization is the initial number of inputs minus the number of inputs whose values are constants. These specialized programs are also called residual programs. Kleene's s-m-n theorem established a theoretical possibility of function specialization ... (blah, blah)" i.e. one does some calculation in advance, by cases, at compile time).

They might have to distinguish from that, if they didn't mean that. Not that any more secrets seem likely to escape from their pens!

And as the projected audience seems to consist largely of software engineers, I doubt any of THOSE will know about it in order to be confused in the first place. Still, I have met people in French research institutes working on ONLY that.

Strange but true facts about French computer science ...

Just thought I'd add another stuffy, prim, academic stick to whip the poor authors with!

Peter

17451k commented 5 years ago

We have extended the paper from 8 pages to 15 and added more technical details and illustrations. Also, we want to clarify that our work is not confidential: we are open to demonstrate and discuss all methods we use, encountered difficulties and achieved results, and also to complement it all with examples from the developed specifications. We believe that the security field can greatly benefit from more open approaches to share knowledge. However, we can show only some parts of the HIMACF model, since it does not belong to us.

Besides, it seems that we have failed to clearly state that this is the work-in-progress. There are still some more things to do, as stated in the last paragraph of the paper. It means that at the moment we do not have a full working replaying toolchain. There are some divergences found between the specification and the implementation during our experiments but it's almost impossible to describe them without diving deep into details of the specification and the security module. We thought it will be ok not to mention them since they were detected manually during our experiments with the method, and we also want to present this as work-in-progress. We hope that even without them the work described in the paper can be interesting and useful, and also spark some discussions.

We are grateful for all the comments and suggestions. You can find the revised paper in the attachment.

Best regards, ~ Ilya & Denis

OpenCERT_2019_paper_5_v2.pdf

pbreuer commented 5 years ago

You could tell us what a "replaying toolchain" is! Please avoid jargon everywhere.

Come to that, I am still not sure what "replaying" is and all jargon words like that need careful definition/explanation/referencing or preferably avoiding, so the poor reader does not give up in despair. Is that where you capture a trace (inputs and outputs) and do some analysis on it? Like checking it matches a (nondeterministic?) abstract machine specification?

opencert commented 4 years ago

Camera-ready version for the post-proceedings. paper_5.pdf