An experiment in disallowing expression statements (or "unused values"), must_use/MustUse etc.

Having worked with Rust for a while I grew fond of Rust telling me I'm not using some value returned from a function. After reading https://github.com/python/mypy/issues/6936 I started thinking about a possible approach to get something similar in Mypy, possibly by adding metadata to return types using PEP 593's Annotated:

_must_use_flag = object()
_T = TypeVar('_T')
MustUse = Annotated[_T, _must_use_flag]

# May perform a partial write
def write_bytes(data: bytes) -> MustUse[int]:
    # ...

But then I considered it some more and figured out that really the majority of return values from the functions I deal with should be handled one way or another, so it's possible the "must use" behavior should be the default. Well, not necessarily default-default, but gated with a Mypy flag possibly. I implemented a dirty version of this where an error is raised on every expression in statement context (not only function calls), you can find it in the following commit: https://github.com/jstasiak/mypy/commit/8e8667b8b7910a5b9284bd0cc9850acb4e181c21 (I'm skipping literal values reporting /because docstrings/, ellipsis (because it can't possibly be an error in this context) and -values).

I also added some logging code in the same branch so I can gather some stats and open discussion on this: https://github.com/jstasiak/mypy/commit/d685a010c26d7740ae0839b8ba84196e034c5c68

I ran mypy self-test with this and here are the results:

# The number of all expressions encountered
% wc -l expr_logger.txt 
  226248 expr_logger.txt

# The number of disallowed expression statements
% wc -l stmt_logger.txt 
     409 stmt_logger.txt

So ~0.18% of all expressions in Mypy are currently used in statement context. The counts of specific types of expressions are as follows:

% cat expr_logger.txt | cut -d ' ' -f 2 | sort | uniq -c | sort -r -n
92277 NameExpr
38158 MemberExpr
25766 CallExpr
18001 StrExpr
10924 IntExpr
8503 TempNode
5161 EllipsisExpr
4837 ComparisonExpr
4648 OpExpr
4048 TupleExpr
3502 IndexExpr
2723 ListExpr
2683 UnaryExpr
1896 SliceExpr
 478 ListComprehension
 404 GeneratorExpr
 367 DictExpr
 332 ConditionalExpr
 291 SuperExpr
 256 TypeAliasExpr
 240 SetExpr
 173 LambdaExpr
 169 YieldExpr
 148 TypeVarExpr
 101 CastExpr
  40 SetComprehension
  40 DictionaryComprehension
  26 FloatExpr
  22 NamedTupleExpr
  20 BytesExpr
  12 YieldFromExpr
   2 TypedDictExpr

The only expression statements in Mypy are call expressions, which isn't surprising:

% cat stmt_logger.txt | cut -d ' ' -f 2 | sort | uniq -c | sort -r -n
 409 CallExpr

I log the following data in those cases so we can dig deeper:

% head stmt_logger.txt                                               
EXPR CallExpr -> Any in /Users/user/projects/mypy/mypy/test/data.py:201:8
EXPR CallExpr (None) -> Any in /Users/user/projects/mypy/mypy/test/data.py:217:12
EXPR CallExpr (None) -> Any in /Users/user/projects/mypy/mypy/test/data.py:219:8
EXPR CallExpr (None) -> Any in /Users/user/projects/mypy/mypy/test/data.py:221:12
EXPR CallExpr (shutil.copytree) -> builtins.str in /Users/user/projects/mypy/mypy/test/data.py:232:16
EXPR CallExpr (None) -> builtins.int in /Users/user/projects/mypy/mypy/test/data.py:245:16
EXPR CallExpr (None) -> Any in /Users/user/projects/mypy/mypy/test/data.py:269:12
EXPR CallExpr (None) -> builtins.str* in /Users/user/projects/mypy/mypy/test/data.py:396:8
EXPR CallExpr (None) -> Any in /Users/user/projects/mypy/mypy/test/data.py:488:4
EXPR CallExpr (None) -> Any in /Users/user/projects/mypy/mypy/test/data.py:491:4

Numbers by the function called:

% cat stmt_logger.txt | sed 's/.*(\(.*\)).*/\1/' | sort | uniq -c | sort -r -n
 382 None
   3 shutil.copyfile
   3 mypy.stubtest.test_stubs
   2 unittest.main
   2 mypy.test.testdaemon.run_cmd
   2 mypy.dmypy.client.get_status
   2 mypy.dmypy.client.check_status
   1 subprocess.check_output
   1 shutil.copytree
   1 shutil.copy
   1 os.umask
   1 mypy.test.testparse.skip
   1 mypy.parse.parse
   1 mypy.build.load_graph
   1 mypy.build.find_module_and_diagnose
   1 mypy.build.build
   1 gc.collect
   1 func
   1 builtins.list
   1 EXPR CallExpr -> Any in /Users/user/projects/mypy/mypy/test/data.py:201:8

This is clearly wrong because I fail to extract the full names of the majority of the callees here but it's not obvious to me why.

Numbers by the type returned (ignore the string-cutting artifacts in complex types containing whitespace, I just use cut here for simplicity):

% cat stmt_logger.txt | cut -d ' ' -f 5 | sort | uniq -c | sort -r -n
 120 argparse.Action
  73 builtins.int
  41 builtins.bool
  38 Any
  24 mypy.types.Type
  16 builtins.int*
   6 builtins.list[builtins.str]
   6 Tuple[builtins.int,
   5 builtins.str
   4 sqlite3.dbapi2.Cursor
   4 mypy.types.Type*
   4 builtins.set*[builtins.str]
   4 builtins.bool*
   4 Union[Literal['C'],
   3 mypy.types.Instance*
   3 mypy.nodes.FuncItem*
   3 builtins.dict*[builtins.str,
   3 Union[mypy.types.Type,
   3 Tuple[mypy.types.Type,
   3 Tuple[builtins.str,
   2 unittest.TestProgram
   2 typing.AbstractSet[builtins.str]
   2 os.stat_result
   2 mypy.types.TypeVarDef
   2 mypy.plugin.Plugin*
   2 mypy.nodes.IndexExpr*
   2 mypy.binder.Frame
   2 builtins.bytes
   2 Union[mypy.nodes.TypeInfo,
   2 Union[mypy.nodes.SymbolTable,
   1 typed_ast.ast3.AST*
   1 mypy.types.TypeAliasType*
   1 mypy.split_namespace.SplitNamespace*
   1 mypy.report.AbstractReporter
   1 mypy.nodes.TypeInfo*
   1 mypy.nodes.MypyFile
   1 mypy.nodes.Expression*
   1 mypy.moduleinspect.ModuleProperties
   1 mypy.build.BuildResult
   1 in
   1 builtins.str*
   1 builtins.list[mypy.types.Type]
   1 builtins.list[mypy.types.TypeVarDef]
   1 builtins.list[Union[mypy.build.BuildResult,
   1 builtins.list*[builtins.str]
   1 builtins.dict[builtins.str,
   1 argparse.Namespace*
   1 Union[mypy.types.Instance,
   1 Union[builtins.str,
   1 Union[builtins.int,
   1 Tuple[mypy.types.TypeAliasType*,
   1 Tuple[mypy.types.Instance*,

I can't comment on Mypy-specific types, but as for some others:

argparse.Action is returned by all those argparse calls:

mypy/dmypy/client.py:57: error: Unused expression of type 'argparse.Action'  [misc]
    p.add_argument('--timeout', metavar='TIMEOUT', type=int,
    ^
mypy/dmypy/client.py:59: error: Unused expression of type 'argparse.Action'  [misc]
    p.add_argument('flags', metavar='FLAG', nargs='*', type=str,
    ^
mypy/dmypy/client.py:63: error: Unused expression of type 'argparse.Action'  [misc]
    p.add_argument('-v', '--verbose', action='store_true', help="Print detailed status")
    ^
mypy/dmypy/client.py:64: error: Unused expression of type 'argparse.Action'  [misc]
    p.add_argument('--fswatcher-dump-file', help="Collect information about the current file state")
    ^
mypy/dmypy/client.py:72: error: Unused expression of type 'argparse.Action'  [misc]
    p.add_argument('-v', '--verbose', action='store_true', help="Print detailed status")
    ^
mypy/dmypy/client.py:73: error: Unused expression of type 'argparse.Action'  [misc]
    p.add_argument('-q', '--quiet', action='store_true', help=argparse.SUPPRESS)  # Deprecated
    ^

and since there are a lot of arguments there's a lot of those false-positives.

stderror.write(), file.write() and other similar calls:

mypy/main.py:78: error: Unused expression of type 'builtins.int'  [misc]
                    f.write(msg + '\n')
                    ^
mypy/main.py:117: error: Unused expression of type 'builtins.int'  [misc]
                    stdout.write(formatter.format_error(n_errors, n_files, len(sources),
                    ^
mypy/main.py:120: error: Unused expression of type 'builtins.int'  [misc]
                stdout.write(formatter.format_success(len(sources),
                ^

Those are almost entirely false-positives – all cases I looked at here were text tiles and TextIO.write() never performs partial writes so its return value is irrelevant.

__init__() methods return Any (?)
dict.pop() and list.pop() return values but they're commonly used to just remove values from the collections

I'm attaching a full report of running modified Mypy on itself for your consideration. After looking at the output of this experiment I'm tempted to claim that instead of having an opt-in way to force use of values it makes more sense to have a flag to make it default and also provide an opt-out mechanism.

report.txt

Edit: by an opt-out mechanism I mean something like:

_may_not_use_flag = object()
_T = TypeVar('_T')
MayNotUse = Annotated[_T, _may_not_use_flag]

# Always performs full write, the return value is not important unless one wants to
# count the bytes written
class TextIO:
    def write(self, data: str) -> MayNotUse[int]:
        # ...

One more data perspective: there are ~21.5k CallExpr's returning something other than None in that test run:

% cat expr_logger.txt | rg CallExpr | rg -v -- "-> None" | wc -l
   21865

So those 409 "unused function call values" are ~1.87% of all function calls returning something non-None.

This is a lot of work, nice. I have a couple questions but don't take this as me trying to knock what you've done, that's not my intention at all

I've not used Rust so I'm having trouble imagining a case where you're required to use the return value of a function. I can imagine something like returning a file handler and then needing to close it after, but that's usually handled with __enter__ and __exit__

Also, does this make sense to have as a feature in mypy or would it be better to use in a general purpose linter, where you check if a var goes unused at all?

For example, in Go, unused variables are not allowed, whether that be from a return or anything else.

package main

func fn() string {
    return "foo"
}

func main() {
    result := fn()
}

// $ go run main.go
// # command-line-arguments
// ./main.go:8:2: result declared but not used

In other words, is it worth implementing a check to selectively flag unused vars over just checking everything?

Hey, sure, that's fair and I totally understand the questions, even more so considering how controversial I expect the topic to be (although maybe I overestimated how controversial, since yours is the only response so far – should I post this somewhere else? I still feel like this is more linter-territory than typing-territory). So, about the questions at hand:

I've not used Rust so I'm having trouble imagining a case where you're required to use the return value of a function.

My motivation is that the more I think about it the more cases I see where, if one doesn't use a value returned by an expression (a function call in particular) it's an indicator of a programming error and Mypy (and other linters) are all about helping to eliminate those. Let's say you divide all functions into three categories:

Functions that "do something" (mutate "global state", "instance state" etc.) but don't return anything
Functions that don't do anything to mutate "global state" but return some useful value
Functions that both mutate "global state" and return something

Now, the first class of functions can be ignored here, since they don't return anything. :)

The second class is almost certainly an error if you ignore the values they return, for example all (I think?) str methods: capitalize, isnumeric, join, replace etc.

The third class is tricky and we have two subclasses:

3a. Functions that return an "important" value, like BytesIO.write(), socket.write(), socket.recv_into(). Ignoring those is, again, almost certainly an error. 3b. Functions that return something merely for convenience, like ArgumentParser.add_argument(), list.pop(), dict.pop() – ignoring the return values is ok

So, to summarize, my claim is this: the majority of values returned from functions (either considering the number of unique functions returning something or the number of function calls) belong to classes 2 and 3a, so it makes sense for linters to complain about ignored return values as they indicate programming errors.

Now: unfortunately there's no way for linters to distinguish between classes 3a and 3b currently and ideally there would be a way to opt out of this on a per-function basis, like:

class list[T]:
    def pop(self, index) -> Ignorable[T]:
        ...

without this we have false-positives like I listed in the original post.

Also, does this make sense to have as a feature in mypy or would it be better to use in a general purpose linter, where you check if a var goes unused at all?

(...)

In other words, is it worth implementing a check to selectively flag unused vars over just checking everything?

Linters already complain about unused variables and that's fine, I think this should be kept together with something like the error I propose here. If you have an unused variable at least you can see that a function you use return something. If you just call

name = "jonathan"
name.capitalize()
print(name)

Flake8 will be very happy with this – no unused variables after all – yet the code contains a programming error. Since non-type-checking linters can't know if a function returns something this needs to be checked in a static type checker.

Also: I'd think assigning to _ would be a nice way of silencing this error (works in Mypy now as far as I remember) and indicate "I know what I'm doing, I'm ignoring this value on purpose":

name = "Kółko"
# Just verifying the name is ASCII-only, we don't care about the result
_ = name.encode('ascii')

Thanks for the reply.

The capitalization example would make sense for this since, like you said, it has no side effects and relies solely on the return value. So if you're expecting it to mutate the string that is in fact a programming error.

I understand the use-case but I keep coming back to the idea that, if a function returns something, it's assumed you'll use it. Hence, any function you don't use the return value of would potentially be a programming error. The set of functions that have a so-called "optional return value" is not very large. It's safe to assume the vast majority of returning functions have a value which is intended to be used. If you were going to flag functions with something about return value usage, it would make more sense to flag the functions whose return value can be ignored since they are the minority.

I guess this brings me back to my original question about just checking all returns. If you're not assigning or otherwise using a return value in some way, that is almost certainly an overlooked error. If you as the programmer know what you're doing you can silence the error with _ = fn(). This would be more general purpose, applying to dynamic typed code as well.

Making it a general purpose linter check would also mean it's opt-out not opt-in (assuming you enable the check). The problem with making it opt-in with some sort of flag is that:

What % of Python code is typed?
What % of typed Python code has functions with optional returns?
What % of devs writing typed Python code with optional returns will know and/or remember to use this?

It has some practical uses and looks good on paper but that doesn't mean much if people don't use it which is my concern.

I think it's an interesting proposal and a nice use of Annotated.

In Rust, afaik, flagging something as must_use is opt-in. I think there is a good reason for that. As the author of a function, indicating you must use something is a strong signal that it's important to the user of the code to use the return value. If there are false positives (even a relatively small amount), then I believe very few people will use this feature, as they may not know whether to trust it.

Simple mistakes like calling a function that is immutable without using its result name.capitalize() are usually easily caught when the code is tested, so I'm not that interested in catching these. The more insidious ones are like you mentioned BytesIO.write().

I know it seems like it might take awhile, but it might be an idea and less controversial to instead propose an opt-in MustUse annotation and push for getting the relevant functions in typeshed annotated with this.

Not sure if https://github.com/python/typing/issues is a better place to discuss this though.

@Kangaroux I feel like we almost totally agree but for some reason you think we don't – I'd like to understand the confusion. What exactly do you mean by "general purpose linter check"? Maybe we're using different words for the same thing here. :)

@syastrov I considered opt-in approach but I did some back of the envelope mental calculations and ended up an estimation of 95+% functions having that MustUse marker. I just had a look at the python-zeroconf codebase (because it's a piece of software I work with and know) to see some numbers (I reformatted the code for all function headers to fit in single lines):

# All functions
~/projects/python-zeroconf % cat zeroconf/__init__.py | rg "def " | rg -v "# def " | wc -l
     169
# Exclude "special" methods like __str__, __init__, __eq__ etc.
~/projects/python-zeroconf % cat zeroconf/__init__.py | rg "def " | rg -v "# def " | rg -v "def __" | wc -l
     124
# Exclude functions returning nothing
~/projects/python-zeroconf% cat zeroconf/__init__.py | rg "def " | rg -v "# def " | rg -v "def __" | rg -v -- "-> None" | wc -l
      54

So we have 54 relevant functions. I just went through all of them and there are only 2 which return something that can be ignored and it's not a programing error, so that's 2 out of 54 = 3.7% false positive. But by looking at this I discovered that the fluent API that's exposed by those functions is not used anyway, so those return values can be safely removed. So for python-zeroconf that would mean annotating 52-54 functions with MustUse which is a lot of noise and it seems counterproductive to me compared to annotating the false positive functions.

I expect my estimate (95+% of non-special functions returning something relevant as long as they return anything) to be reasonably accurate and, considering this, I don't believe in the opt-in way here (opt-in as in Mypy strictness flag – yes; opt-in function marker – no) so I can't really be the one to push things in this direction.

What I mean by general purpose linter would be something like PyLint or flake8. Something that could be applicable to all Python code, not just projects that use mypy

I would love to see this functionality in mypy. It's one of the bigger concerns I have with unchecked corners of the language for the reasons @jstasiak has listed in previous comments.

If I wanted to dive into the codebase to attempt an implementation, where should I start? Would a new (off by default) option be accepted? I may be way out of my depth in trying to implement this but I do think it's an important feature to have.

I don't think implementing this is the problem (you can find my experimental implementation here, I used it to generate the stats above), it's rather about getting people on board.

I don't know how/whether to proceed here and I'd like some input from the mypy devs, I'm happy to submit this as a pull request (a mypy switch disabled by default) but this will require more work so I want to gather the necessary information first.

@ilevkivskyi @JukkaL @gvanrossum do you have any thoughts on this?

Ah I'm sorry for missing that in your original post @jstasiak - thank you for the pointer.

If prior art helps at all, pyright added this in 1.1.89 with a reportUnusedCallResult rule (off by default). It works well based on some local testing.

We have class that is immutable data structure, and its add method returns new instance. People always forget about it, no matter how ugly we make the name of the method to remind them. This would save us.

python / mypy

An experiment in disallowing expression statements (or "unused values"), must_use/MustUse etc. #8951