unused-code / unused

A tool to identify potentially unused code.
https://unused.codes
MIT License
280 stars 11 forks source link

When using Phoenix config setting, `deps` and `_build` directories should be ignored in a monorepo setup #2

Open milmazz opened 4 years ago

milmazz commented 4 years ago

In a monorepo setup, the current behavior from unused is the one I expect, which is similar to the following:

$ cd monorepo_folder/myapp
$ unused --search rg --likelihood high
Unused: analyzing 138 terms

Calculating cache fingerprint...
Unused: analyzing 138 terms

Working [============================================>]  99%

./lib
     LayoutView                                       ./lib/myapp_web/views/layout_view.ex           occurs once
     PageView                                         ./lib/myapp_web/views/page_view.ex             occurs once
     Session                                          ./lib/myapp_web/endpoint.ex                    occurs once
     UserSocket                                       ./lib/myapp_web/channels/user_socket.ex        occurs once
     error_tag                                        ./lib/myapp_web/views/error_helpers.ex         occurs once
     get_service_by_script_slug                       ./lib/myapp/scripts.ex                         occurs once
     secret                                           ./lib/myapp_web/controllers/api_controller.ex  occurs once
     template_not_found                               ./lib/myapp_web/views/error_view.ex            occurs once

./test
     ChannelCase                                      ./test/support/channel_case.ex                         occurs once
...

But doing the same with unused-rs I get the following:

$ unused-rs -l high --format compact
🧐 Analyzing... [########################################]     766/766    (0s)
Accounts                      deps/yellow_pages/lib/yellow_pages/accounts.ex                           Only one occurrence exists
App                           deps/swarm/lib/swarm/app.ex                                              Only one occurrence exists
chan                          _build/dev/lib/phoenix_live_reload/priv/static/phoenix_live_reload.js    Only one occurrence exists
...

This is using the Phoenix configuration setting, and I expect that unused-rs ignores the deps and _build directories.

Why do I think that the problem is only present in a monorepo setup? because I just did the following:

$ monorepo_folder
$ rsync -avz myapp /tmp/
$ cd /tmp/myapp
$ git init
$ git add .
$ git commit -m "initial structure"
$ git ls-files | xargs ctags -f .git/tags
$ unused-rs -l high --format compact

and the result is the expected one. Please let me know if you need more information.

joshuaclayton commented 4 years ago

So, unused splits this apart into two discrete chunks:

In cloning Constable, which ignores _build/ and deps/, I wasn't able to reproduce this, even though the dependencies were present in the FS.

Can you take a look at your tags file to see if the tokens are present?

joshuaclayton commented 4 years ago

@milmazz in thinking through this further, I'd like to continue to lean on the settings from .gitignore here. Can you confirm that ignoring those folders both makes sense and that it results in a clearer set of results?

milmazz commented 4 years ago

I just checked the current configuration in one of my monorepos, and we have the following:

According to the Git documentation we have:

Git normally checks gitignore patterns from multiple sources, with the following order of precedence, from highest to lowest (within one level of precedence, the last matching pattern decides the outcome):

  • Patterns read from the command line for those commands that support them.
  • Patterns read from a .gitignore file in the same directory as the path, or in any parent directory, with patterns in the higher level files (up to the toplevel of the work tree) being overridden by those in lower level files down to the directory containing the file. These patterns match relative to the location of the .gitignore file. A project normally includes such .gitignore files in its repository, containing patterns for files generated as part of the project build.
  • Patterns read from $GIT_DIR/info/exclude.
  • Patterns read from the file specified by the configuration variable core.excludesFile.

So, my case is the second one, those paths (deps, _build) are correctly ignored by Git, and in consequence, it should also be ignored by unused, right?

Tomorrow I will try to "play" a bit with the patterns in my .gitignore to see if that's the issue here. I will also try to reproduce the issue from scratch.

joshuaclayton commented 4 years ago

@milmazz For some additional context, I'm shelling out to git ls-files and git ls-files --others --exclude-standard to build the list of files to search against (source: https://github.com/unused-code/unused_rs/blob/27cdce48b9ba914f83ce7c1697577ae79c6ba9c5/crates/codebase_files/src/lib.rs#L9-L16).

In poking around locally, I was able to verify that adding a sub-directory with its own .gitignore did in fact honor the limitations as expected.

I can also say that the old version of unused did not do the right thing; specifically, it only looked at the top-level .gitignore as it didn't shell out to git ls-files.

Finally, if you were to set up your machine for Rust development and build the whole project, one of the binaries it generates is tracked-files-rs, which will perform these operations and write all of the files to STDOUT. If the the two git ls-files commands above don't help, it might be worth giving that a go to see what you find.

joshuaclayton commented 4 years ago

@milmazz another thought – if you had a monorepo with a top-level .gitignore that ignores deps, and cd into one of the apps, it will NOT look a directory above and take into account that .gitignore. I wonder if that's actually what you were running into?

milmazz commented 4 years ago

@joshuaclayton getting back to this, sorry for the delay. I'll try to set up the development environment and build the latest version, and then I'll try again. I'll keep you posted.

milmazz commented 4 years ago

This is the result with the latest commit 8b4f82220037710bb73c8165efb2444bb15b617a

$ ~/Dev/rust/unused_rs/target/debug/unused-rs doctor
Unused Doctor

[OK] Check: Is the tags file not present in the list of files searched?
     The tags file loaded ("tmp/tags") is not present in the list of files searched
[OK] Check: Are tokens found in the application?
     267 token(s) found
[OK] Check: Are files found in the application?
     118 file(s) found
[Warning] Check: Is the tags file generated with Universal Ctags?
     Using tags program: Exuberant Ctags
[OK] Check: Does the loaded configuration have available project types?
     Loaded the following project configurations: Phoenix, Rails

Outcome: 4 OK, 1 warnings, 0 errors

The output from this project under a monorepo structure seems fine:

$ ~/Dev/rust/unused_rs/target/debug/unused-rs -l high --format compact
__using__                  lib/dialer_metrics_web.ex                                                      Only one occurrence exists
pop                        lib/dialer_metrics/interactors/ready_engine/agent_state.ex                     Only one occurrence exists
slugize                    lib/dialer_metrics/aggregators/agent_event.ex                                  Only one occurrence exists
template_not_found         lib/dialer_metrics_web/views/error_view.ex                                     Only one occurrence exists
update_all                 lib/dialer_metrics/airtable/campaign_controller_config.ex                      Only one occurrence exists

Except that __using__ should have been ignored based on my ~/.unused.yml and slugize does not exist in the mentioned file lib/dialer_metrics/aggregators/agent_event.ex

But running the same command for a sibling project under the monorepo structure I keep getting results for deps/*:

$ ~/Dev/rust/unused_rs/target/debug/unused-rs -l high --format compact
Accounts                      deps/yellow_pages/lib/yellow_pages/accounts.ex                              Only one occurrence exists
App                           deps/mongodb/lib/mongo/app.ex                                               Only one occurrence exists
chan                          _build/dev/lib/phoenix_live_reload/priv/static/phoenix_live_reload.js       Only one occurrence exists
error_tag                     web/views/error_helpers.ex                                                  Only one occurrence exists
where                         deps/ecto_sql/lib/ecto/adapters/postgres/connection.ex                      Only one occurrence exists

and the .gitignore for both projects is almost the same, at least both are ignoring /deps and /_build.

And here is the difference between git ls-files and tracked-files-rs:

$ git ls-files > git-ls-files.out
$ ~/Dev/rust/unused_rs/target/debug/tracked-files-rs > tracked-files-rs.out
$ diff git-ls-files.out tracked-files-rs.out 
316a317
> git-ls-files.out
752a754
> tracked-files-rs.out

Meaning that both files are almost equal, except that tracked-files-rs includes git untracked files, like the one that I'm generating.

Let me know if you need more information or if you prefer to jump in a call, that way I guess we can find the solution together.

milmazz commented 4 years ago

@milmazz another thought – if you had a monorepo with a top-level .gitignore that ignores deps, and cd into one of the apps, it will NOT look a directory above and take into account that .gitignore. I wonder if that's actually what you were running into?

I think that's not the case here. What I have is the following:

$ cat .gitignore
# Static artifacts
node_modules

$ cat agent_desktop/.gitignore
# App artifacts
/_build
/db
/deps
/*.ez

# Generated on crash by the VM
erl_crash.dump

# Since we are building assets from web/static,
# we ignore priv/static. You may want to comment
# this depending on your deployment strategy.
/priv/static/

# The config/prod.secret.exs file by default contains sensitive
# data and you should not commit it into version control.
#
# Alternatively, you may comment the line below and commit the
# secrets file as long as you replace its contents by environment
# variables.
/config/prod.secret.exs
.elixir_ls
.cache
.env
.staging-env

$ cat dialer_metrics/.gitignore
# App artifacts
/_build
/db
/deps
/*.ez

# Generated on crash by the VM
erl_crash.dump

# Files matching config/*.secret.exs pattern contain sensitive
# data and you should not commit them into version control.
#
# Alternatively, you may comment the line below and commit the
# secrets files as long as you replace their contents by environment
# variables.
/config/*.secret.exs
.env
/priv/static
/.cache/

In the previous comment, the output for dialer_metrics is ok, meaning that it excludes the deps and _build directories. But somehow that doesn't happen with agent_desktop. I already checked the result from git ls-files | grep 'deps/' and that's empty.

joshuaclayton commented 4 years ago

@milmazz tracked-files-rs is a misnomer - it represents all files in the app that aren't explicitly ignored via .gitignore. So, it combines the two git commands (referenced above) to ensure that Unused will search against both files tracked by git as well as files not checked in yet. This is meant to capture work done in new files that are not excluded, to ensure those files are also searched.

Re: ~/.unused.yml, do you mind moving that file to ~/.config/unused/unused.yml? I've changed the location of the config file so it's not in a user's home directory.

Finally, you mention

slugize does not exist in the mentioned file lib/dialer_metrics/aggregators/agent_event.ex

Does the tags file include that token? I'm wondering if that's an issue where the tags file is out-of-date, in which case, there's a bug there where we can run into cases where a tags file can have a token that's not found anywhere, and it'll still show up as a single use.