ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.71k stars 2.38k forks source link

deleting a variable mid function #594

Closed pluto439 closed 6 years ago

pluto439 commented 6 years ago

When I'm programming in javascript, I do it all the time. Very useful, I can clearly see if some variable will not be used again.

Can LLVM do that?

Ilariel commented 6 years ago

@pluto439, Simple answer: No. Complicated answer: Yes, but not really.

Also how would it be useful at all?

andrewrk commented 6 years ago

@Ilariel

Simple answer: No. Complicated answer: Yes, but not really.

Sometimes it seems like what someone is trying to accomplish doesn't make sense. And that makes it tempting to respond with curt answers like these. But they can come off as rude and make the other person feel defensive. It works out better for the community if we give each other benefit of the doubt.

When I'm confused about what someone is trying to accomplish - and I must confess I am confused about what @pluto439 is trying to accomplish in this issue - I try to find out more details about their use case.

@pluto439

Can you give a code example of what this would look like and how it would work?

Does variable scoping accomplish this? e.g.

fn foo() {
    var a: i32 = 1234;
    {
        var b: i32 = a + 10;
        // here we can use b
    }
    // here b is "deleted" AKA "out of scope"
}

What issue are you running into that makes you want to delete a variable mid function?

Ilariel commented 6 years ago

@andrewrk, I was wishing he would explain it would it be useful at all? I admit my wording was bad.

If we assume removal from memory then it is an issue that cannot be achieved without redesign of language semantics and custom calling convention

For lexical scoping it doesn't make any sense unless a variable name is going to be reused and @andrewrk mentioned variable scoping can achieve "removal"

Perelandric commented 6 years ago

@pluto439 JavaScript doesn't let you delete variables once declared, with the exception of global variables which can be deleted in a limited number of cases.

So are you talking about assigning them something like a null value, or do you mean actually removing the name from the scope?

pluto439 commented 6 years ago

Well, here are a few examples from my javascript code. They mostly help me figure out my own code, so that I can see where exactly the last use of this variable is. I just don't like to leave trash behind, just another safety net for me.

savevideos_hook_video = function(aSubject, path) {

    aSubject.QueryInterface(Ci.nsITraceableChannel)

    var obj_TargetFile = FileUtils.getFile("Home", path)
    delete path

    //rest of the function
toString_group_to_three = function(in_number) {
    var str = in_number.toString()
    var str_len = str.length

    var num_of_digits_in_first_group = str_len % 3
    var num_of_spaces = Math.floor(str_len / 3)

    var out=str.substring(0,num_of_digits_in_first_group)
    var str_i = num_of_digits_in_first_group
    delete num_of_digits_in_first_group

    while (num_of_spaces != 0) {
        out = out + ' '
        num_of_spaces = num_of_spaces - 1

        out=out+str.substr(str_i,3)
        str_i = str_i + 3
    }
    return out
}
    var res = XVIDPAGE.exec(aSubject.originalURI.asciiSpec);
    if (!res) return

    var id = res[1]
    delete res

Example of what I want to avoid, search for }}}}} https://pastebin.com/L5L8VpsC . It's not mine btw. Probably just having one code block would've been enough here, it's probably written for compilers that don't support creating variables mid function.

Also I'd like to avoid how golang handles errors returned from their functions, that they stay for the rest of the function. Have to always change := to = around.

err := smfn.DoSomething()
if err {
  return 1
}

err = smfn.DoSomething()
if err {
  return 1
}

If I'll make a language, it will have an operator // as a shortcut for "delete this variable" and "this is the last usage of this variable".

5 2 divmod -- a b
a b //
5 2 divmod -- a b
[ a// b// ] print

Creating extra code blocks sort of works, but I dislike that I have to define variables in advance. Need to specify type of a variable, I'd like to move towards C++ auto and the likes of it. Python spoiled me, I guess.

Ilariel commented 6 years ago

@pluto439, deleting variables like that is against strict mode rules and possibly prevents optimizations that jit compilers can do. Meaning what you are doing is actually not only bad practice, but possibly also harmful for performance reasons. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/delete https://www.smashingmagazine.com/2012/11/writing-fast-memory-efficient-javascript/

Perelandric commented 6 years ago

delete actually has no effect on those variables. It may be useful as a visual indicator, but it doesn't impact the variable itself. If you log the result of the delete expression, you'll see that it returns false, meaning the variable was untouched.

Perelandric commented 6 years ago

I could see some usefulness in dropping a variable early though, so as to avoid accidental use of the wrong one without the rightward drift of another scope. Don't know if it warrants a new feature but it's an interesting thought.

pluto439 commented 6 years ago

@Ilariel In those links, it's more about not changing object's layout for nothing. I think you're fine if you delete local variables with delete. In c and zig, it's literally impossible to change object's layout after declaring it.

thejoshwolfe commented 6 years ago

Regardless of how JavaScript works, there's a feature proposal here that's in line with the zen of zig: communicate intent precisely. And this feature seems pretty well defined: some mechanism to explicitly remove a local variable from scope contrary to the stack-based {} scoping structure already available.

This doesn't seem like a bad idea to me. It does seem pretty complicated for very little gain, though. An example of the complexity is that you should only be allowed to remove a variable from scope if you're at the same scope level where it was declared (can't conditionally remove a variable, etc.).

Also it would be pretty silly to remove a variable from scope right as it's going out of scope anyway. I don't want anyone to think that that would be a "best practice" or anything, so we might consider making that situation a compile error.

This feature is a lot like access modifiers: the main purpose is to prevent programmers from doing something. Access modifiers are extremely useful in large code bases, because they specify between a symbol being visible only in this file or being visible to the entire project. That's a huge difference, and so access modifiers can be very valuable. However, this proposal has a much more modest gain; it only specifies a symbol being visible until the end of the block, or not.

Zig has a precedent for expecting a code reader to understand the entire function body in order to understand how code behaves (e.g. defer is considered an explicit control structure, not hidden control flow.). Following this precedent, this feature proposal is not necessary.

Let's keep this discussion about Zig, not about JavaScript. Thanks!

pluto439 commented 6 years ago

@Perelandric Jesus christ, I never even checked! Looks like delete only deletes global variables, wow! I just assumed javascript worked like python. Wow.

pluto439 commented 6 years ago

@thejoshwolfe The reason it's important for me is because I hope to use zig interactively in the future. That is, in an enviroment, where there are no functions, everything you type is one big function. Same reason I'd like to get rid of defer.

thejoshwolfe commented 6 years ago

The reason it's important for me is because I hope to use zig interactively in the future. That is, in an enviroment, where there are no functions, everything you type is one big function.

Wow! That's quite a novel usecase. Let's leave this issue open, and could you please open a new issue discussing that usecase. I would imagine there are many obstacles that would need to be overcome in addition to variable scopes.

PavelVozenilek commented 6 years ago

What would this code do?

var x = 1;
for (...) {
   if (...) delete x;
}
thejoshwolfe commented 6 years ago

What would this code do?

Syntax error. delete would need to be at statement level. and if you put curly braces around the delete statement, then it's a semantic error, because deleting a variable would need to be in the same block as the variable was declared.

PavelVozenilek commented 6 years ago

@thejoshwolfe:

I found that this currently compiles:

 var x : i32 = 1;
...
 x = undefined;

Perhaps this could be reinterpreted as "deleting a variable".

andrewrk commented 6 years ago

Yeah, that's perfect, actually. With (planned but not yet implemented) compile-time undefined value tracking (see #597) this accomplishes the goal of removing a variable from scope, by communicating the correct intent to the compiler.

In fact this is such a satisfying solution that I will close the issue now.

andrewrk commented 6 years ago

from https://github.com/zig-lang/zig/issues/597#issuecomment-342657852

It's not the same as deleting a variable then. I want it to stop existing, completely. I will only ever delete variables that are temporary anyway.

A stack variable in zig exists in the assembly instruction at the beginning of the function that increases the stack pointer by some number of bytes in order to fit all the stack variables.

So if you want a variable to stop existing completely, what you are actually wanting to do, is to replace one function definition with another, while the software is running. This is hot code swapping (see #68).

If you use a variable earlier in the function, and then want to delete it, there's nothing to delete. You still needed to increment the stack pointer at the beginning of the function by some number of bytes to fit the variable, for the time that you used it.

pluto439 commented 6 years ago

Actually, local variables exist in registers. They are placed on stack only if there's not enough place in registers.

There's no need for code swapping. It's called "liveness analysis", and it's actually done automatically by all good compilers, including llvm. My "delete variable" is more for programmer itself, so that he can know where exactly the scope of a variable ends, and so that he can use this name again without any care in the world. Technically, you could use ret ret2 ret3, and compiler would've made the same code. But it wouldn't have been as readable.

pluto439 commented 6 years ago

Technically delete wouldn't actually delete anything. It will only mark some memory as free to reuse. So that if there are two variables that are never alive at the same time, they can share memory together.

It's needed for making long functions easier to write. And that's needed for good repl.

pluto439 commented 6 years ago

Deleting a variable is actually surprisingly simple. You just need to:

1) make sure variable creation and deletion happens on the same block (overwise it's too complex for compiler to figure out if variable still exists or not after some for, it would require some theorem solving and knowledge of the program as a whole, too hard)

2) throw error if variable is accessed after being deleted

3) if, after deleting a variable, another variable gets created with the same name, treat it as a new independant variable with the same name.

Liveness analysis from llvm is quite likely to do the rest of the job. The only problem that may appear is with debug information, debugger may be confused that there are two variables under the same name. It will make long functions easier to write

pluto439 commented 6 years ago

Variable deletion will be really handy when you need to chain variables:

def test(a):
    b = do_something(a)
    del a

    c = do_something(b)
    del b

    return c

And when you need to create a temporary variable that will never be used again, like error codes:

def test2():
    err = do_something('1.txt')
    if (err) return -1
    del err

    err = do_something('2.txt')
    if (err) return -1
    del err

    err = do_something('3.txt')
    if (err) return -1
    del err

    return 0
pluto439 commented 6 years ago

When C first appeared, it was required to place all variables at the beginning of function. Later, it was possible to declare new variables anywhere in the function. Being able to delete variables seems like a natural progression.

pluto439 commented 6 years ago

I think I can liive without it, by just putting comments //del variable instead. Since I didn't knew that javascript version wasn't working for so long. It would've been a good safety net though. I suspect everyone will be too lazy to actually use del anywhere.

pluto439 commented 6 years ago

Python uses this error message for undefined variables, or recently deleted variables. Just for the future reference. It could probably add "it was deleted just above at line xx". Error messages are a very important part of any language.

UnboundLocalError: local variable 'a' referenced before assignment

thejoshwolfe commented 6 years ago

Should you be allowed to use a variable in a defer that gets deleted?

{
    var file = open(path);
    defer file.close();
    del file;

    var file = open(path);
    defer file.close();
    del file;
}

what about goto rules?

{
    var file = open(path);
retry:
    do_something(file);

    if (foo()) goto end;
    del file;

    if (bar()) goto retry;
end:
    var file = open(path);
    file.close();
}
pluto439 commented 6 years ago

@thejoshwolfe

I'd remove defer from the language, because there is no way to fire defer mid function. It forces you to create more functions that is really necessary. (This needs checking.)

del and goto is quite an example, wow. How about this.

With var/del, compiler should know what variables exactly are available at every line. It should know what register stands for what variable, and what stack address stands for what variable.

I guess, by gotoing after del file, this variable still gets deleted.

Trying to goto in between var and del will mean that variable will exist, but it's not guaranteed to contain anything meaningfull. It's responsibility of programmer to make sure that this variable will get initialized correctly. This probably should be an compile-time error, but I want to give programmer a freedom to shoot himself in the foot. Using uninitialized variables is an interesting optimization.

Could turn deleting a variable without calling a destructor into a compile-time warning. I want to keep it optional, because programmer may have a full list of all variables that require destructors somewhere separately. Or he may just leave all clean-up to os. Maybe there will be a way to mark a variable as [dont-worry-about-clean-up].

andrewrk commented 6 years ago

@pluto439 what you are proposing is complicated and I'm not convinced that it is sound. You might be describing something that is a contradiction. What you are proposing is so complicated that in order to explain how it should work, I think you should go implement it, and then report back your findings.

pluto439 commented 6 years ago

So, there's no way to delete a variable in llvm? I get so confused when I read their docs.

pluto439 commented 6 years ago

Once the debugger will be complete, I'd like to display all "undefined" variables as "?". It will be more intuitive, I think. It's for variables that may contain garbage, and it doesn't really matter what really is in them right now.

pluto439 commented 6 years ago

Using code blocks will work almost the same. Except you don't have to delete absolutely everything on block end, you may delete only some variables and keep others.

kyle-github commented 6 years ago

I have been very busy lately, so I am just now trying to read through this proposal.

I am not understanding what problem it really solves. If you want to reuse variable names because you have a long function, then should not that function be refactored? If it is to help the compiler, the liveness calculations in LLVM are pretty good and it will reuse registers when it determines that the variable is no longer used.

If you need temporaries, then the "automatic return value as the last value in a block" aspect of Zig should do what you want, no? I see that used all over in Zig as a very nice idiom. Zig does not need C's trinary operator because of this.

@pluto439, I think the debugger is probably one of the biggest problems. You would need to keep some sort of state as to which variables were defined at every line of code as far as I can tell. I can set a breakpoint anywhere. So, I cannot (without massively invasive code generation) determine which variables are visible as the program executes. That must be stored in the debugging information.

The ability to use code blocks seems like it gives you everything you want, but it is clean and straightforward. It also does not require the removal of defer.

Sorry that I am not understanding why this change is worth the apparent amount of effort :-(

pluto439 commented 6 years ago

I'll just do it by using a preprocessor. A preprocessor that creates res res2 res3 variables out of var res del res var res del res var res del res. And reports an error if res was used after del res.

I am not understanding what problem it really solves.

It's not that necessary, I'm just trying to make programming in C more clean. People go to great lengths just to make programming more secure and comfortable, why not start on the language level.

If you want to reuse variable names because you have a long function, then should not that function be refactored?

There's no reason to use a ton of small functions instead of one big function. Big functions are usually easier to understand and write. Code Complete agreed with me on that.

If it is to help the compiler

It is to help the programmer. So that he doesn't have to read the whole function to see what happens with some variable.

If you need temporaries, then the "automatic return value as the last value in a block" aspect of Zig should do what you want, no?

Example?

The ability to use code blocks seems like it gives you everything you want, but it is clean and straightforward.

I don't want to create code blocks for every small thing. I want to keep as much code on the same level, flat. Code blocks ain't that straightforward, del is more straightforward. Click here https://github.com/zig-lang/zig/issues/594#issuecomment-343087325 , it's as straightforward as you can get.

I think the debugger is probably one of the biggest problems. You would need to keep some sort of state as to which variables were defined at every line of code as far as I can tell. I can set a breakpoint anywhere. So, I cannot (without massively invasive code generation) determine which variables are visible as the program executes. That must be stored in the debugging information.

If you just use preprocessor, it's not a problem at all, except working with debuggers is now more annoying.

If you do it on the code generator level, it shouldn't be that much more complex than what compiler and debugger are already doing. They already have support for creating a variable mid function and code blocks, deleting a variable is just one step forward. If you intend to reuse third party debuggers instead of writing your own, then yes, it's problematic.

It also does not require the removal of defer.

Defer is pretty terrible anyway, it makes control flow jump around. It's an inferior replacement for finally, whose only advantage is that it doesn't require you to create an extra identation. It doesn't destruct on code block end, only on function block end, and you will get some confusing results if you'll try to use defer inside a for loop.

And I'm not sure on this one, but I think inside catch and finally you can access local variables. Huh, with defer -- finally you also can access variables. Same with setjmp -- catch. You probably can't goto beyond the borders of each.

Even their names are bad. longjmp instead of throw, setjmp instead of catch, defer instead of finally, function start instead of try. The last one is interesting, but it makes your program less flexible, and makes you create more functions than necessary.

I don't understand why zig even has defer if there are no plans for exceptions anyway.

pluto439 commented 6 years ago

I used the term "liveness analysis", apparently it's also called "stack slot coloring". The thing that decides how local variables are stored.

I was pretty much told to just do it myself. I'm just gathering notes here at this point. I'll probably do it.

Ilariel commented 6 years ago

@pluto439, Don't take this badly, but given we seem to have found a better solution for similar behaviour, this issue is already closed, what you are suggesting doesn't seem to be worth the time to implement it unless you do it yourself and the benefits seem to be small if non-existent, I recommend you gather notes as you said you are and do extensive research on this idea of yours, C and Zig as it also seems that you haven't really understood how things work as you are comparing apples to oranges. Also check whether the implementation and usage would fit Zig Zen and the syntax when you have done it. After that open another issue (or post to this issue).

Defer is pretty terrible anyway, it makes control flow jump around. It's an inferior replacement for finally, whose only advantage is that it doesn't require you to create an extra identation. It doesn't destruct on code block end, only on function block end, and you will get some confusing results if you'll try to use defer inside a for loop.

And I'm not sure on this one, but I think inside catch and finally you can access local variables. Huh, with defer -- finally you also can access variables. Same with setjmp -- catch. You probably can't goto beyond the borders of each.

First of all, no it is not a replacement and it is quite different when compared to the finally construct in most of the programming languages. Sure both allow you to do something at the end of an operation, but they work differently and shouldn't be compared.

Defer in Zig is closely related to C++ scope guards and Go defer https://blog.golang.org/defer-panic-and-recover while finally tends to be part of the try-catch-finally chain. As Zig doesn't have exceptions built-in to language and it uses low level error codes instead there is no reason for try-catch-finally. If you need a more detailed "exception" can still return an enum which contains a result or detailed error struct if you deem it necessary due to error case being recoverable and so common.

Try-catch-finally usually forces one to wrap all operations which can return an error or rather throw an error into a try block and then you create a catch block which catches the exceptions, possibly even one for each different exception depending on the language and whether the function should deal with them separately. Then as the last thing after try and possibly catch, finally is run. Not only does this possibly hide information from the programmer as you can't visibly see which function or operation inside try block can throw an error without looking at them. This is why you have to deal with error codes when possibly returned and if you use an enum you would probably use a switch so in both cases you can visually see immediately which are possible error sources.

Also If you look the defer documentation you can see this example part Referring to http://ziglang.org/documentation/master/#defer

// defer will execute an expression at the end of the current scope.
fn deferExample() -> usize {
    var a: usize = 1;

    {
        defer a = 2;
        a = 1;
    }
    assert(a == 2);

    a = 5;
    a
} 

Defer executes at the end of the current scope! This is clearly different behaviour from finally. The most common use case for defer is most likely memory and resource management and you wouldn't want to wrap your allocations into try-catch-finally blocks because they just don't do the same thing. That being allocate memory, defer deallocation at allocation site and then use it instead of having to write the deallocation code explicitly into the end of the function (possibly finally block), meaning defer allows us to write cleaner code and it is clearly not just an inferior replacement.

Even their names are bad. longjmp instead of throw, setjmp instead of catch, defer instead of finally, function start instead of try. The last one is interesting, but it makes your program less flexible, and makes you create more functions than necessary.

You can't compare apples to oranges like this. Catch and throw have barely anything else to do with setjmp and longjmp other than on lower level they can be used to implement the former.

Actually I'm a bit confused why are you even comparing those since I don't think that they are even in the std and when you search Zig repo only place where you can even find setjmp and longjmp is in the c_headers and deps (so I guess they aren't in the std after all).

Anyway considering setjmp are C standard library functions, and if implemented in Zig they probably should have similar functionality. See http://en.cppreference.com/w/c/program/setjmp for setjmp and http://en.cppreference.com/w/c/program/longjmp for longjmp functions in C standard library. As you see they are quite different beasts from throw and catch.

I don't understand why zig even has defer if there are no plans for exceptions anyway. I hope you my explanation on defer made you understand

I also feel that I need to address this too

There's no reason to use a ton of small functions instead of one big function. Big functions are usually easier to understand and write. Code Complete agreed with me on that.

This is my opinion so someone else might say something different, but you can rather easily find arguments for both large and small functions, but usually it is more about modular reusable code that can be maintained easily.

Smaller functions that are shared with multiple longer aren't inherently bad if they improve readability and they don't abstract away necessary information.

Large functions aren't bad either unless they for example contain duplicated code that could be shared and inlined or do so many different things that probably should be split into different functions just to ease maintainability.

After all self-documenting functions are best code and when you have a really large function that is called something like download_url_scrape_web_page_and_find_css_file_location_and_x_amount_of_rules you have probably done something wrong but of course it might just be that the bad thing was just giving a bad name to a function.

pluto439 commented 6 years ago

This is getting in the territory of exceptions, which I created a topic about here #578.

Defer executes at the end of the current scope!

Thanks for checking it! I was thinking zig worked like golang. Here is an example, they recommend to create an extra function https://github.com/golang/go/issues/3978 . To me this looks ugly, using advanced features to plug holes in the language.

defer-panic-and-recover

Do you know if recover is planned? No mention of it in the docs.

a really large function that is called something like download_url_scrape_web_page_and_find_css_file_location_and_x_amount_of_rules

Functions like "open_window" or "init_physics" can get pretty large. And something that is used at the beginning may never be used again. Need to make lifetime of variables as short as possible, it will make code easier to read.

Having too many functions actually makes code harder to read, control flow is hard to figure out, it jumps all over the place, need to have tools for code navigation to find anything.

Catch and throw have barely anything else to do with setjmp and longjmp

Ain't they both are used for exception handling? I think they are just different names for one thing. They work just slightly differently internally, with longjmp not being able to chain/nest exceptions properly. Read #578 for more info, around this part: "Linux uses different approach to exception handling than windows, because windows method was patented at the time. They still use setjmp/longjmp since then."

I don't like how functions are merged with exception handling in golang, functions should be just functions -- a piece of code that gets called and returns. I want my program to execute linearly, and only create functions when it's really necessary. If a piece of code gets executed only once, and a bunch of functions only execute together in they same order, they shouldn't be functions.

It really depends if there are better alternatives to finally or defer. Look at this https://github.com/zig-lang/zig/issues/578#issuecomment-343738259 , dumped a bit of what can be.

Creating extra function arguments when I need to pass something is quite annoying.

pluto439 commented 6 years ago

473 is surprisingly related, it also complains how you can't fire defer mid function. But instead of just removing defer, he proposed including undefer.

This undefer would go really well with the del. Because what defer does? Cleans up trash from stack. When does it need to do that? Before del.

Actually, it already works like that, since on code block end, all defer gets fired. Why I'm complaining about golang bugs in zig.

I'm still concerned about how this zig defer will work with exceptions.

pluto439 commented 6 years ago

I was playing with ffmpeg filters recently, it's somewhat close to what I hope for https://ffmpeg.org/ffmpeg-filters.html

The difference is, the "variables" are autodeleted on use. You have to make copies of "variables" manually, with split and asplit, and you can delete them with nullsink and anullsink. Very confusing at times, and terribly uncomfortable. Because you absolutely have to make sure that input variables match output variables exactly, everything else is an error.

I'm dreaming of a language where I can type code like var1 var2 do-something -- output1 output2. And deleting them with a simple this-is-deleted // and last-use// do-something -- dont-need-this//. I described it here https://board.flatassembler.net/topic.php?p=199776#199776 . Wish I had money to implement it, but nobody shares them. Either I suck at explaining this, or I'm talking with the wrong people. At least steal my ideas, do my work for me, yes.