rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
95.69k stars 12.33k forks source link

Remove redundant integer suffixes. #22501

Closed eddyb closed 6 years ago

eddyb commented 9 years ago

As #21511 has shown, there are lots of cases in which integer suffixes aren't needed at all. Usually audits are needed to make sure the intended type is inferred, and disabling the fallback to i32 wouldn't really work for tests and whatnot.

It should be easy to modify the compiler to dump the equivalent of --pretty=typed (with pretty-printing showing no suffixes on expressions) into some temporary directory, then compare the results from two make check runs, one with all the suffixes removed.

gchp commented 9 years ago

I'd like to look into this one, though I'm not sure how to modify the compiler in the way that you mention. Any suggestions on where to start?

eddyb commented 9 years ago

The entry-point into the pretty printer seems to be this call in driver::pretty::pretty_print_input. call_with_pp_support handles --pretty=typed by running analysis passes itself, which isn't exactly what we want. By combining those two, we get

let src_name = source_name(input);
let src = sess.codemap().get_filemap(&src_name)
                        .src.as_bytes().to_vec();
// For this you'd have to make TypedAnnotation's analysis field public.
let annotation = TypedAnnotation { analysis: analysis };
pprust::print_crate(sess.codemap(),
                    sess.diagnostic(),
                    ast_map.krate(),
                    src_name.to_string(),
                    &mut MemReader::new(src),
                    box old_io::File::create(&pp_out).unwrap(),
                    &annotation
                    true).unwrap();
let analysis = annotation.analysis;

which can be inserted at this point in compile_input.

pp_out_path is still missing, but I think the following method should produce an unique path for each rustc invocation during the build:

let pp_out = outputs.with_extension("rs");
let pp_out_file = pp_out.as_str().expect("non-utf8 pp output file").replace("/", "--");
// If this doesn't work, hardcode an absolute path here.
let pp_out_path = Path::new(concat!(env!(CFG_BUILD_DIR), "int-suffix-pp")).join(pp_out_file);

Aside from that, you would also need to hack the pretty-printer so it doesn't output integer suffixes - should be easy to treat them just like the unsuffixed cases.

eddyb commented 9 years ago

Since I didn't hear back from @gchp, I implemented what I mentioned in a branch of mine - I intend to open a PR soon. I did go a bit crazy and cleaned up a few things in the process. I test it like so:

rm -rf test-pretty-dump; mkdir test-pretty-dump
# compile stage1 rustc first, with no flags, to avoid errors as stage0 doesn't have them
make x86_64-unknown-linux-gnu/stage1/bin/rustc
make check-stage1 RUSTFLAGS="-Z unstable-options -Z pretty-keep-going -Z pretty-dump-dir=test-pretty-dump --xpretty=typed,unsuffixed_literals"

This gives me a bunch of files (I didn't run this build for long - an almost full one with tests had over 2600):

home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--alloc-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--arena-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--collections-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--core-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--flate-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--getopts-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--graphviz-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--libc-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--log-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--rand-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--rustc_bitflags-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--serialize-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--std-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--term-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--unicode-4e7c5e5c.rs

An example from libcollections:

// source
static TAG_CONT_U8: u8 = 128u8;
// pretty-printed
static TAG_CONT_U8: u8 = (128: u8);

The flag-based approach has two big problems, though: it messes up pretty tests and it wouldn't work with cargo. I'll attempt adding support for an env var of the form RUSTC_PRETTY_DUMP=mode:dir that is equivalent to --xpretty=mode -Z pretty-keep-going -Z pretty-dump-dir=dir, would be ignored when --pretty is already passed, and it would also work with cargo.

My invocation would thus become:

# no bootstrapping issues!
make check-stage1 RUSTC_PRETTY_DUMP="typed,unsuffixed_literals:test-pretty-dump"
eddyb commented 9 years ago

I've rebased my branch and added the env var functionality - oh and I think I have an almost-perfect way to remove integer literal suffixes everywhere:

sed -i -r -e 's/\b(0x[0-9a-fA-F_]*[0-9a-fA-F]|0o[0-7_]*[0-7]|0b[01_]*[01]|[0-9]([0-9_]*[0-9])?)_*([ui](8|16|32|64|s(ize)?)?|uint|int)/\1/g' **.rs
eddyb commented 9 years ago

This is working out! I have some stats (cc @huonw @aturon @alexcrichton):

How I did the diffs:

# make a fresh git working dir
cd ..; mkdir rust-unsuffix-data; cd rust-unsuffix-data; git init
# add a commit for the "before" dataset (sed removes comments which are only diff noise)
cp ../rust/before-int-suffix-removal/* .; sed -i -e 's|//.*$||' *.rs; git add *; git commit -m 'before'
# add a commit for the "after" dataset, on top of "before" (same sed to remove comments)
git rm *; cp ../rust/after-int-suffix-removal/* .; sed -i -e 's|//.*$||' *.rs; git add *; git commit -m 'after'
aturon commented 9 years ago

@eddyb Just want to say, this is awesome :+1:

tamird commented 9 years ago

closed by #22994 yeah?

joliv commented 9 years ago

I would think. Could someone with authority close this?

huonw commented 9 years ago

Thanks!

eddyb commented 9 years ago

Somehow I forgot to comment that this was never finished, #22994 contained merely the easy cases.

Havvy commented 8 years ago

So, what are the hard cases? Is this bug still E-Easy?

cramertj commented 8 years ago

@eddyb Just found this-- what's the current status? Still easy?

eddyb commented 8 years ago

@cramertj Well, I gave up on trying to remove integer suffixes from the language, before 1.0 I was convinced we could replace them with type ascription. And the tools I used to automatically check actual impact are beyond rebasing.

One can still clean up existing code by hand, of course.

brson commented 8 years ago

Making any more progress on this will requiring hacking up new tools in the compiler to detect them. It would be a nice generally to be able to detect unused suffixes (maybe clippy could do it cc @llogiq @Manishearth).

Manishearth commented 8 years ago

Works for me in clippy. @eddyb, could you file an issue there with heuristics on how to detect an unnecessary suffix?

eddyb commented 8 years ago

@brson It doesn't really work out within a lint. Although it could be possible to take out each suffix and pass the function through typeck to see how it infers, but that seems slow and painful.

llogiq commented 8 years ago

I see two options:

  1. Heuristics for easy cases – for example with binary ops and integers, we know that the types are (mostly) the same (well, as long as Rust keeps its stance on integer types vs. ops), we know that comparisons return bools etc. We know that within arrays, all elements share the same type. Within struct literals, fields have a known type (however substitutions can make it generic). Same goes for function arguments. So by implementing some simple heuristics, we can have some quick wins without requiring full inference. Not a 100% solution, though.
  2. I wonder if type inference stores the constraints it uses to determine the type somewhere (e.g. "this binding must be usize because it's used in a function foo(..) that takes usize and is added to bar elsewhere which is also usize"). It's at least present during error reporting. Making this information available to late lints would enable us to implement needless integer literal suffix detection (or even needless type annotation) without incurring the cost of building an artificial method and feeding it to the type checker.
eddyb commented 8 years ago

@llogiq That information is not there during error reporting. Obligations are put in a graph of sorts so you can tell what caused an obligation to be in the graph, but sources of type inference information are lost, as the set of inference variables is directly updated.

brson commented 8 years ago

Although it could be possible to take out each suffix and pass the function through typeck to see how it infers, but that seems slow and painful.

This is what I was imagining. Doesn't matter if it's slow if it's the only way to get that useful info!

llogiq commented 8 years ago

@eddyb: Then it's probably heuristics to speed up easy cases and type inference for harder ones.

steveklabnik commented 6 years ago

Triage: it seems this issue has mutated since the original issue, is it still relevant? I know that the idea, long ago, was that type ascription could replace suffixes in general, but that never came to fruition.

Mark-Simulacrum commented 6 years ago

Since it seems this issue has migrated to a feature request for a lint that warns on integer suffixes that aren't used in code, I'm going to close it as we want those to go through RFCs and it also seems like without that lint there's not going to be traction within the compiler codebase either.