sup-heliotrope / sup

A curses threads-with-tags style email client (mailing list: supmua@googlegroups.com)
http://sup-heliotrope.github.io
GNU General Public License v2.0
889 stars 96 forks source link

Drop `git ls-files` in gemspec #580

Closed utkarsh2102 closed 4 years ago

utkarsh2102 commented 4 years ago

Hi @IPv2 and @danc86,

Thanks for the active maintenance of sup again! :heart: However, while maintaining this in Debian, we found that sup relies on git to list the files which could be done via pure Ruby alternative -- which is what this PR does.

As an addition, this PR makes sure that this gem only ships the required files to the end-users and not other things which are not needed by them! :rocket:

Also, added rubocop-packaging as a development_dependency which will ensure the best practices (and therefore help us in the Debian maintenance!) Here's what it shows us:

➜  sup git:(master)  rubocop --only Packaging

Offenses:

sup.gemspec:35:21: C: Packaging/GemspecGit: Avoid using git to produce lists of files.
Downstreams often need to build your package in an environment that does not have git (on purpose).
Use some pure Ruby alternative, like Dir or Dir.glob.
  s.files         = `git ls-files -z`.split("\x0")
                    ^^^^^^^^^^^^^^^^^

104 files inspected, 1 offense detected

And this PR fixes the same, which helps us in shipping this in Debian! :tada: Hope this would make sense and you'll be open to this change :100:

Signed-off-by: Utkarsh Gupta <utkarsh@debian.org>

IPv2 commented 4 years ago

Thank you again @utkarsh2012 for your help with the downstream maintenance of Debian package sup-mail, and thank you also for this PR.

I think that there are three gemspec modifications suggested in this PR, and I suggest that we consider each individually:

  1. Avoid git ls-files in favour of a pure-Ruby approach

    • I can understand the logic of this, for build environments where git isn't installed. For example, Debian, which (in essence) builds from the release tar-ball.
    • The minimal solution here would, I believe, be a like-for-like replacement of git ls-files with the Rake::FileList suggestion from https://stackoverflow.com/a/57341496, to grab all files except those listed in .gitignore.
    • To implement this, we could modify our gemspec to:
      • Add require 'rake/file_list'
      • Modify s.files = Rake::FileList['**/*'].exclude(*File.read('.gitignore').split)
    • Would this be sufficient to remove the requirement for Debian patch 0003-Remove-calling-git.patch, @utkarsh2102?
    • If so, then I would be happy with this, if you are also happy @danc86?
  2. Exclude various files from the packaged gem

    • For example, remove files like CONTRIBUTORS, Rakefile, HACKING, and tests from the packaged gem.
    • I'm more on-the-fence about this.
    • I measured the size difference of the built .gem file, and removing these files only reduces the final gem size by 4 KB, from 255 KB to 251 KB.
    • I quite like that the gem is a snapshot in time, and contains all the information needed by a human - so a person with only the gem, via gem unpack, would have all they need.
    • What do you think @danc86?
  3. Add rubocop-packaging as a new development dependency

Thank you again @utkarsh2012!

utkarsh2102 commented 4 years ago

Let me help give you a broad perspective on this:

Thank you again @utkarsh2012 for your help with the downstream maintenance of Debian package sup-mail, and thank you also for this PR.

You're welcome! :heart:

1. **Avoid `git ls-files` in favour of a pure-Ruby approach**

   * I can understand the logic of this, for build environments where git isn't installed.
     For example, Debian, which (in essence) builds from the release tar-ball.

This is of course a problem whilst maintaining this downstream. But not only that, here are some more things: (originally pointed out by David, upstream maintainer of bundler)

Originally, git ls-files inside the default gemspec template was designed so that users publishing their first gem wouldn't unintentionally publish artifacts to it. However, in this case we're not beginners anymore and sup doesn't have a C extension or anything (I suppose!?), so the risk of anything like this happening is very low. Also note that bundler won't let you release if you have uncommitted stuff in your working repository, so the risk is even lower now.

   * The minimal solution here would, I believe, be a like-for-like replacement of `git ls-files`
     with the `Rake::FileList` suggestion from https://stackoverflow.com/a/57341496, to grab
     all files except those listed in `.gitignore`.

Can do, I have been doing similar patching, so yeah, I'll amend this PR to use Rake::FileList.

   * Would this be sufficient to remove the requirement for Debian patch
     `0003-Remove-calling-git.patch`, @utkarsh2102?

Indeed, this is very helpful. This is a win-win! :rocket:

2. **Exclude various files from the packaged gem**

   * For example, remove files like `CONTRIBUTORS`, `Rakefile`, `HACKING`, and tests from the
     packaged gem.
   * I'm more on-the-fence about this.
   * I measured the size difference of the built .gem file,  and removing these files only reduces
     the final gem size by 4 KB, from 255 KB to 251 KB.
   * I quite like that the gem is a snapshot in time, and contains all the information needed by
     a human - so a person with only the gem, via `gem unpack`, would have all they need.

These files are not really needed by end-users. These will be needed for development purposes or other things. The general (best) practice is to not ship anything that's not needed, really. But in case you still want to do, sure, I'll be happy to add these things back again! :sweat_smile: Generally, we should not ship tests, gemspecs, Gemfiles, Rakefiles, and other things! :smile:

3. **Add `rubocop-packaging` as a new development dependency**

   * This looks very hot off-the-press - congratulations on creating this!
      I saw the first commit from https://github.com/utkarsh2102/rubocop-packaging
      was only last month.

Yes, rubocop-packaging was born because we had been facing a lot of problems downstream. This not only helps Debian, but all the downstream maintainers (Fedora, etc, etc).

This project is being mentored by Antonio (the Ruby "guru" in Debian :place_of_worship:) and David (the upstream maintainer of rubygems/bundler, byebug, pry-byebug, and so many more things! :place_of_worship:)

   * I saw that the only check just now is `gemspec_git_spec.rb`:
     https://github.com/utkarsh2102/rubocop-packaging/tree/master/spec/rubocop/cop/packaging -
     are more coming soon?

Of course, another cop dropping by this month (the PR is almost ready there!) and 2 more the next month! We're working out on all the problems being faced and we'd want to have them fixed in upstream itself.

I'll soon create a RubyGems guide, so it'll help enlighten and aware more upstream maintainers :smile: However, some guides on this still exist.

   * I'd be more inclined to avoid this as a main development dependency, because all of our
     other development dependencies are very standard gems.

Ouch, well, this kinda hurt :) But fair enough since this is only a month old?

However, I'd like to point out that even if it's a month old, there's a mention of it in the known extensions: https://docs.rubocop.org/rubocop/extensions.html (where the author of RuboCop seemed to like what we're doing! :)) And even with one cop alone in one month's time, it is being used by more standard gems: https://github.com/utkarsh2102/rubocop-packaging/network/dependents

(I plan to reach out to more once we release the new cop tomorrow or day after or by max, this Friday!)

Lastly, some of the development dependencies of sup are already using Dir or Dir.glob to list the files instead of git. So they indeed aren't a problem for us anyway!!

Thank you again @utkarsh2012!

No, thanks to you both for the maintenance :heart:


Anyway, once we reach the conclusion of what's to be done here, I'll be happy to amend this PR. One thing for sure we need here is the use of Rake::FileList. Let me know about others.

danc86 commented 4 years ago

I think it's reasonable to expect to able to evaluate the gemspec using only "pure" Ruby, without shelling out to external commands like git. It makes the build simpler in restrictive build environments like distro build systems. So I am fine with a patch which swaps out the git ls-files invocation with an explicit list of globs, as long as the end result is that the gem still contains the complete set of source files.

As to the question of whether the gem should include all sources... in my view it absolutely should. I know some maintainers prefer to treat Rubygems as more like a "binary" distribution platform, or make some distinction between "sources needed at runtime vs. sources needed for development". But that distinction is not meaningful. For one thing, the letter and spirit of the GPL require us to distribute "complete corresponding source code" which means all the sources, including tests. And those are actually useful for downstream distributors, for example in the Fedora package which runs the tests. I think one reason for the point of difference here is that some packages treat Github as their authoritative source tarballs and Rubygems as a convenient way to install the code. But for Sup (which predates both Github and Rubygems, it was originally hosted on Rubyforge) the authoritative source tarballs are really the Rubygems. So I want to continue including all sources as before.

And I would prefer not to gain a hard dependency on the rubocop-packaging gem (and its dependency chain), at least while it's relatively new.

(edited because I bumped the button too early)

danc86 commented 4 years ago

I made a bit of a mess of the previous comment, and accidentally closed the PR in the process. I didn't mean to do that. I've edited it to fill out what I was trying to say, although the PR subscribers probably got emailed a half-written version. Sorry for all the mess.

@utkarsh2102 based on my comment above, would you mind amending the PR to include just the changes to replace git ls-files with Dir.glob, without a new dependency on rubocop-packaging?

IPv2 commented 4 years ago

Thank you @utkarsh2102, and thank you @danc86!

@utkarsh2102 -

Let me help give you a broad perspective on this ...

That makes perfect sense, thank you!

sup doesn't have a C extension or anything (I suppose!?), so the risk of anything like this happening is very low

Don't worry, there are no C extensions here. :-) However, there are some files generated during (for example) rake test that we wouldn't want to package - all covered by .gitignore.

Of course, another cop dropping by this month (the PR is almost ready there!) and 2 more the next month!

That's great! I wish you all the very best for rubocop-packaging. It sounds like it solves a very real problem, and I would be very happy for you if it becomes the new standard.

Ouch, well, this kinda hurt :)

Sorry, this was not intended to hurt in any way - I wish you all the very best for the project, and I think you have some exciting momentum behind you for such a new project already. It solves a real problem, and I think it would be great if it becomes widely used. I just, with my Sup maintainer hat on, would look for it to be more established before we add it in here - Sup is 14 years old, so a 1 month project is extremely new in Sup terms! Other projects are much faster-moving than Sup.

@danc86 -

But for Sup (which predates both Github and Rubygems, it was originally hosted on Rubyforge) the authoritative source tarballs are really the Rubygems. So I want to continue including all sources as before.

Compelling reasoning - I'm convinced.

And I would prefer not to gain a hard dependency on the rubocop-packaging gem (and its dependency chain), at least while it's relatively new.

I agree on not gaining a hard dependency at this stage (fuller reasoning in earlier part of my post).

However, what about a soft dependency?

I am certainly planning to regularly run rubocop-packaging against Sup. So could we perhaps add an easy way to install the gem and execute the checks off to the side in, say, contrib? (So not as part of our normal dev build, but saves me from some manual typing every time I want to execute the checks.)

@utkarsh2102 based on my comment above, would you mind amending the PR to include just the changes to replace git ls-files with Dir.glob, without a new dependency on rubocop-packaging?

I suggest perhaps Rake::FileList over Dir.glob, because we already have a dependency on rake, and the end code is a little neater.

If we're looking for all sources, then we'd also need to include the dot files. So we might need something along the lines of this, @utkarsh2102 (this is untried, but hopefully gives the idea): Rake::FileList['**/{,.}*'].exclude('git').exclude(*File.read('.gitignore').split)

danc86 commented 4 years ago

I am certainly planning to regularly run rubocop-packaging against Sup. So could we perhaps add an easy way to install the gem and execute the checks off to the side in, say, contrib? (So not as part of our normal dev build, but saves me from some manual typing every time I want to execute the checks.)

Sure, but let's take the question of how to run rubocop-packaging to a separate PR. My main issue is that a gem-level dependency will require me to get it packaged into Fedora.

I would be interested in having it hooked up to Travis though. That's even better than having to remember to run it yourself.

I suggest perhaps Rake::FileList over Dir.glob, because we already have a dependency on rake, and the end code is a little neater.

If we're looking for all sources, then we'd also need to include the dot files. So we might need something along the lines of this, @utkarsh2102 (this is untried, but hopefully gives the idea): Rake::FileList['**/{,.}*'].exclude('git').exclude(*File.read('.gitignore').split)

I have no strong preference between Rake::FileList or Dir, they seem to be equivalent aside from the lazy evaluation. Dir has the advantage of being in the standard library so maybe we should stick with that.

There is a downside I can see with your suggestion of globbing everything and excluding files listed in .gitignore. I often leave random junk uncommitted in the root of git trees, like files called TODO fix the thing with notes about stuff I was in the middle of, or incomplete test modules that aren't added yet, etc. They aren't in .gitignore because they aren't build products, they are just half-finished stuff, and I want them to show up when I do git status so I remember they are there. Right now my git tree has an uncommitted mbox file for testing, a xapian index, an uncommitted test fixture email, and a file of notes.

There is a very good chance I would accidentally publish a gem containing stuff like that, if the gemspec was globbing for all files.

With @utkarsh2102 's current approach, using Dir to explicitly name the base directories like bin, lib, etc there is much less chance I will accidentally build a gem containing files that aren't committed to git.

Also (a more theoretical concern) .gitignore can contain syntax beyond just plain globs that Rake::FileList can understand, like regexp ignore patterns. I think trying to parse it ourselves in the gemspec might be unwise.

IPv2 commented 4 years ago

Good points, @danc86 - how about another option? We could create a git pre-commit hook that runs git ls-files every commit, and commits the output to a file (e.g., file-list). Then read the file-list in pure Ruby. Confirmed fully compatible with the existing git ls-files approach!

IPv2 commented 4 years ago

(I steered away from a git pre-commit hook, because they run locally, and it is a pain to ensure that everyone clones the repository using the same hooks.)

After further research, I've found that we used to have an authoritative list of filenames inside Manifest.txt - until 2009 (removed in commit 969061797a).

There are so many edge cases here with file globbing includes/excludes that I suggest that we instead simply re-add Manifest.txt, with its contents defined as the list of files managed by git (i.e., contents should always match git ls-files). Then change this PR to read Manifest.txt in pure Ruby.

We can ensure that Manifest.txt remains in sync with git ls-files via adding a new automatic Travis check.

I will raise a new (prerequisite) PR today to recreate Manifest.txt. If @danc86 approves this other PR, then after that, please could you update this PR to use Manifest.txt, @utkarsh2102 ? (I will tag you in a new comment here if/when ready.)

utkarsh2102 commented 4 years ago

It's better to Rake task do this. It's better and simpler this way. I can put this together if you want me to.

IPv2 commented 4 years ago

It's better to Rake task do this. It's better and simpler this way. I can put this together if you want me to.

Thanks! We're thinking on exactly the same lines. :-) I'm halfway through adding a Rake task to generate Manifest.txt. (And the check of Manifest.txt against git ls-files is another Rake task.)

utkarsh2102 commented 4 years ago

I have worked on such a task before, d'you want me to paste the diff here?

utkarsh2102 commented 4 years ago

Heh, great. Can fix this PR tomorrow (and also once that is merged!).

IPv2 commented 4 years ago

There are so many edge cases here with file globbing includes/excludes that I suggest that we instead simply re-add Manifest.txt, with its contents defined as the list of files managed by git (i.e., contents should always match git ls-files) ... I will raise a new (prerequisite) PR today to recreate Manifest.txt. If @danc86 approves this other PR, then after that, please could you update this PR to use Manifest.txt, @utkarsh2102 ?

PR #581

IPv2 commented 4 years ago

Thanks @danc86 for approving PR #581. Now merged, and we have a Manifest.txt.

@utkarsh2102 - Over to you to modify this PR please, should be a one-liner change now to read Manifest.txt instead of calling git ls-files. :-)

IPv2 commented 3 years ago

(Thanks again, @utkarsh2102 & @danc86!)

I am certainly planning to regularly run rubocop-packaging against Sup. So could we perhaps add an easy way to install the gem and execute the checks off to the side in, say, contrib? (So not as part of our normal dev build, but saves me from some manual typing every time I want to execute the checks.)

Sure, but let's take the question of how to run rubocop-packaging to a separate PR. My main issue is that a gem-level dependency will require me to get it packaged into Fedora.

I would be interested in having it hooked up to Travis though. That's even better than having to remember to run it yourself.

Raised new PR #582 for the rubocop-packaging checks.

This installs rubocop-packaging off to the side, within a subdirectory of contrib, so avoiding a hard dependency on the gem. But it is also hooked up to Travis CI, to run the rubocop-packaging checks automatically.