Closed eddyb closed 6 years ago
I'd like to look into this one, though I'm not sure how to modify the compiler in the way that you mention. Any suggestions on where to start?
The entry-point into the pretty printer seems to be this call in driver::pretty::pretty_print_input
.
call_with_pp_support
handles --pretty=typed
by running analysis passes itself, which isn't exactly what we want.
By combining those two, we get
let src_name = source_name(input);
let src = sess.codemap().get_filemap(&src_name)
.src.as_bytes().to_vec();
// For this you'd have to make TypedAnnotation's analysis field public.
let annotation = TypedAnnotation { analysis: analysis };
pprust::print_crate(sess.codemap(),
sess.diagnostic(),
ast_map.krate(),
src_name.to_string(),
&mut MemReader::new(src),
box old_io::File::create(&pp_out).unwrap(),
&annotation
true).unwrap();
let analysis = annotation.analysis;
which can be inserted at this point in compile_input.
pp_out_path
is still missing, but I think the following method should produce an unique path for each rustc
invocation during the build:
let pp_out = outputs.with_extension("rs");
let pp_out_file = pp_out.as_str().expect("non-utf8 pp output file").replace("/", "--");
// If this doesn't work, hardcode an absolute path here.
let pp_out_path = Path::new(concat!(env!(CFG_BUILD_DIR), "int-suffix-pp")).join(pp_out_file);
Aside from that, you would also need to hack the pretty-printer so it doesn't output integer suffixes - should be easy to treat them just like the unsuffixed cases.
Since I didn't hear back from @gchp, I implemented what I mentioned in a branch of mine - I intend to open a PR soon. I did go a bit crazy and cleaned up a few things in the process. I test it like so:
rm -rf test-pretty-dump; mkdir test-pretty-dump
# compile stage1 rustc first, with no flags, to avoid errors as stage0 doesn't have them
make x86_64-unknown-linux-gnu/stage1/bin/rustc
make check-stage1 RUSTFLAGS="-Z unstable-options -Z pretty-keep-going -Z pretty-dump-dir=test-pretty-dump --xpretty=typed,unsuffixed_literals"
This gives me a bunch of files (I didn't run this build for long - an almost full one with tests had over 2600):
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--alloc-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--arena-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--collections-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--core-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--flate-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--getopts-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--graphviz-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--libc-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--log-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--rand-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--rustc_bitflags-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--serialize-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--std-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--term-4e7c5e5c.rs
home--eddy--Projects--rust--x86_64-unknown-linux-gnu--stage1--lib--rustlib--x86_64-unknown-linux-gnu--lib--unicode-4e7c5e5c.rs
An example from libcollections:
// source
static TAG_CONT_U8: u8 = 128u8;
// pretty-printed
static TAG_CONT_U8: u8 = (128: u8);
The flag-based approach has two big problems, though: it messes up pretty
tests and it wouldn't work with cargo
.
I'll attempt adding support for an env var of the form RUSTC_PRETTY_DUMP=mode:dir
that is equivalent to --xpretty=mode -Z pretty-keep-going -Z pretty-dump-dir=dir
, would be ignored when --pretty
is already passed, and it would also work with cargo.
My invocation would thus become:
# no bootstrapping issues!
make check-stage1 RUSTC_PRETTY_DUMP="typed,unsuffixed_literals:test-pretty-dump"
I've rebased my branch and added the env var functionality - oh and I think I have an almost-perfect way to remove integer literal suffixes everywhere:
sed -i -r -e 's/\b(0x[0-9a-fA-F_]*[0-9a-fA-F]|0o[0-7_]*[0-7]|0b[01_]*[01]|[0-9]([0-9_]*[0-9])?)_*([ui](8|16|32|64|s(ize)?)?|uint|int)/\1/g' **.rs
This is working out! I have some stats (cc @huonw @aturon @alexcrichton):
How I did the diffs:
# make a fresh git working dir
cd ..; mkdir rust-unsuffix-data; cd rust-unsuffix-data; git init
# add a commit for the "before" dataset (sed removes comments which are only diff noise)
cp ../rust/before-int-suffix-removal/* .; sed -i -e 's|//.*$||' *.rs; git add *; git commit -m 'before'
# add a commit for the "after" dataset, on top of "before" (same sed to remove comments)
git rm *; cp ../rust/after-int-suffix-removal/* .; sed -i -e 's|//.*$||' *.rs; git add *; git commit -m 'after'
@eddyb Just want to say, this is awesome :+1:
closed by #22994 yeah?
I would think. Could someone with authority close this?
Thanks!
Somehow I forgot to comment that this was never finished, #22994 contained merely the easy cases.
So, what are the hard cases? Is this bug still E-Easy?
@eddyb Just found this-- what's the current status? Still easy?
@cramertj Well, I gave up on trying to remove integer suffixes from the language, before 1.0 I was convinced we could replace them with type ascription. And the tools I used to automatically check actual impact are beyond rebasing.
One can still clean up existing code by hand, of course.
Making any more progress on this will requiring hacking up new tools in the compiler to detect them. It would be a nice generally to be able to detect unused suffixes (maybe clippy could do it cc @llogiq @Manishearth).
Works for me in clippy. @eddyb, could you file an issue there with heuristics on how to detect an unnecessary suffix?
@brson It doesn't really work out within a lint. Although it could be possible to take out each suffix and pass the function through typeck to see how it infers, but that seems slow and painful.
I see two options:
usize
because it's used in a function foo(..)
that takes usize and is added to bar
elsewhere which is also usize
"). It's at least present during error reporting. Making this information available to late lints would enable us to implement needless integer literal suffix detection (or even needless type annotation) without incurring the cost of building an artificial method and feeding it to the type checker.@llogiq That information is not there during error reporting. Obligations are put in a graph of sorts so you can tell what caused an obligation to be in the graph, but sources of type inference information are lost, as the set of inference variables is directly updated.
Although it could be possible to take out each suffix and pass the function through typeck to see how it infers, but that seems slow and painful.
This is what I was imagining. Doesn't matter if it's slow if it's the only way to get that useful info!
@eddyb: Then it's probably heuristics to speed up easy cases and type inference for harder ones.
Triage: it seems this issue has mutated since the original issue, is it still relevant? I know that the idea, long ago, was that type ascription could replace suffixes in general, but that never came to fruition.
Since it seems this issue has migrated to a feature request for a lint that warns on integer suffixes that aren't used in code, I'm going to close it as we want those to go through RFCs and it also seems like without that lint there's not going to be traction within the compiler codebase either.
As #21511 has shown, there are lots of cases in which integer suffixes aren't needed at all. Usually audits are needed to make sure the intended type is inferred, and disabling the fallback to
i32
wouldn't really work for tests and whatnot.It should be easy to modify the compiler to dump the equivalent of
--pretty=typed
(with pretty-printing showing no suffixes on expressions) into some temporary directory, then compare the results from twomake check
runs, one with all the suffixes removed.