ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.87k stars 2.48k forks source link

Make global mutable variables more easily auditable? #4107

Open scottjmaddox opened 4 years ago

scottjmaddox commented 4 years ago

The possibility to embed global mutable variables inside of struct's (inside of functions), etc. without any explicit marker that they are global could make it quite difficult (more difficult than in C) to audit Zig libraries for unsound usage of global mutable variables.

One way to improve the situation would be to require global mutable variables to be prefixed by a global keyword (or static?), just as how thread local variables are prefixed by threadlocal. This would make it very easy to grep for global to identify and audit the (hopefully small number of) global mutable variables in a particular library.

Alternatively, perhaps there could be a tool to identify all global mutable variable definitions and usages? Such a tool would be helpful even if definitions were prefixed by global, since the variable name might be overloaded elsewhere.

hryx commented 4 years ago

@scottjmaddox If you could provide some specifics, what use case are you trying to support? What unsound usage are you trying to catch to catch?

quite difficult (more difficult than in C)

How so?

Rather than a new tool or language change, it might already be possible to find all top-level variables with --verbose-ast or --verbose-ir. Maybe that can help you with your audit.

fengb commented 4 years ago

In C, all globals must be at the top scope. Zig allows you to put (effectively) global variables into structs so they can exist anywhere, including inside functions:

fn foo() void {
    const Bar = struct {
        var this_is_equivalent_to_c_static: usize = 0;
    };
}
JesseRMeyer commented 4 years ago

@fengb Zig's docs showcase namespaced globals as:

const std = @import("std");
const assert = std.debug.assert;

test "namespaced global variable" {
    assert(foo() == 1235);
    assert(foo() == 1236);
}

fn foo() i32 {
    const S = struct {
        var x: i32 = 1234;
    };
    S.x += 1;
    return S.x;
}

which caused my eyes to dilate from shock the first time I saw that. I do not know how a C programmer is expected to know that x is equivalent to static int x in C. What is it about a struct's namespace rules to functions here that maintains x between calls to foo()?

andrewrk commented 4 years ago

What is it about a struct's namespace rules to functions here that maintains x between calls to foo()?

I'm having trouble parsing this question, but maybe this will explain:

The rules are quite consistent: every file is a struct. You can put structs in structs, and you can put structs in functions. The syntax var foo ... in struct top level declarations makes a global variable.

andrewrk commented 4 years ago

In C, all globals must be at the top scope.

Not true!

static int foo(void) {
    static int x = 0;
    return ++x;
}
JesseRMeyer commented 4 years ago

The syntax var foo ... in struct top level declarations makes a global variable.

I'm having trouble reaching this conclusion from your listing of the rules. Let me see if I can uproot and externalize my thinking for you.

Assume S doesn't exist in foo() and all you're left with is little ol' x that essentially resets upon every call to foo(). So foo() always returns 1235. Now wrap x into S. Sure, a function foo() can have a struct defined in it. And foo() is declared at global scope. But what does that have to do with the lifetime of objects foo() defines? I feel like you need another rule to explain this. Why does hoisting x into S grant it persistence between calls when the mother function is declared at global scope? I mean, if that's just what happens, because S is committed to the data segment, then sure. But that's different treatment when x sat alone by itself. There is no straight line from 'wrap a variable in a struct declared in a function declared at global scope' to 'persistent variable storage' in my mind. At an intuitive level those seemed orthogonal.

Not saying the current behavior is flawed and must be changed. Just sharing the experiences of my cognition learning how the language works.

momumi commented 4 years ago

For me it's slightly confusing, because most other languages I'm familiar with use an explicit keyword for variables that have static/global scope while zig overloads the meaning of var based on context.

For example if I write var x = 1; inside a function body, it's a stack allocated variable. However, if I copy var x = 1; and paste inside a struct it gets allocated somewhere else even if the struct is inside a function.

Compare this to the syntax of other languages which make variables with global lifetime explicit:

// C++
void foo() {
   static int counter; // global lifetime variable
   int x; // local variable
}

struct my_struct {
   static int counter; // global lifetime scope
   int x; // member variable
};

// Java
class JavaExample {
   static int counter;
   int x;
}

// Rust
static mut x: i32 = 0; // global variable
let mut x: i32 = 0; // local variable
scottjmaddox commented 4 years ago

Having thought a bit more about this, and other commenters have already discussed this above, I think having a global or static keyword would greatly reduce the confusion about global variables. I won't repeat what has already been pointed out above, but I will reiterate that I can very easily see someone, even someone who's read the documentation, not realizing that in

fn foo() {
    struct S {
        var x:  i32 = 0;
    }
    var y:  i32 = 0;
}

x is a global variable, while y is a local variable. This is not that far off from footguns like uint32 x = -1 being accepted without complaint by C compilers (which just caused me and a coworker many hours of debugging over the past couple days).

To answer this question from @hryx:

What unsound usage are you trying to catch to catch?

There are all kinds of unsound usage of global mutable variables, particularly when it comes to multi-threading. Being able to quickly and easily locate global variables and their usage is a good way to quickly disqualify poorly written libraries as potential dependencies, or to audit libraries for potential causes of multi-threading bugs.

Edit: And not just multi-threading, but also single-threaded concurrency such as Zig's async functions.

fengb commented 4 years ago

(Tangentially related) I’m warming up to Rust’s declaration of memory location followed by optional mutability:

Compare that to current Zig semantics:

These rules make sense to a Zig developer, but I wonder if inheriting Rust’s lifetime based declaration would be easier to learn?

momumi commented 4 years ago

Something else I find weird is that member fields of a struct are delimited with , while const/var fields are delimited with ;

const A = struct {
    var x: u32 = 0;
    y: u32 = 0,
}

If we use only ; and the static keyword:

const A = struct {
    static x: u32 = 0;
    y: u32 = 0;
}

which looks better imo, and its more similar to the syntax of other C-like languages. I think the argument for using commas is that it's easier to copy and paste into a struct literal, but you already have to rip of : type = anyway so it doesn't really make much of a difference. In fact I think it would make more sense to use ; in struct literals instead:

var a = A {
    .y = 10;
};

That way assigning a variable and member fields look the same everywhere. Although maybe that breaks the grammar some how?

Edit:

Maybe even get rid of the . before member fields:

var a = A {
    y = 10;
};
fengb commented 4 years ago

Related: https://github.com/ziglang/zig/issues/2859

ghost commented 4 years ago

Reduced example below of something I had to debug, related to this issue

I wrongly assumed that N.b would be initialized to zero each time I initialized S. Stupid in retrospect.

const std = @import("std");

const N = struct {
    b: u16 = 0,
};

const S = struct {
    var n: N = N{};

    // This is obviously the wrong way to do things, but something equivalent
    // could sneak into your code without you noticing
    fn getN(self: *@This()) *N {
        return &n;
    }
};

test "check if zero" {
    const s = S{};
    std.testing.expectEqual(@as(u16, 0), s.getN().b);
    s.getN().b = 4; // modify
}

test "check if zero again" {
    const s2 = S{}; // footgun?: behavior of `s2` not independent of prior usage of `s`
    std.testing.expectEqual(@as(u16, 0), s2.getN().b);
}
wooster0 commented 3 months ago

Not sure if this has been proposed already but how about not allowing other declarations between global variable definitions? This already applies to container fields:

a: u8,
b: u8,
var x = false;
c: u8,
x.zig:3:1: error: declarations are not allowed between container fields
var x = false;
^~~

So that would reuse an existing solution, like this:

var a: u8 = undefined;
var b: u8 = undefined;
x: bool = false,
var c: u8 = undefined;
x.zig:3:1: error: non-global variable declarations are not allowed between global variable declarations
x: bool = false,
^~~~~~~~~~~~~~~~ 

(error message wording is arguable)

It would basically force global variable definitions to have to be all in one place, which should make them easier to audit.