Closed utkarsh2102 closed 4 years ago
Thank you again @utkarsh2012 for your help with the downstream maintenance of Debian package sup-mail
, and thank you also for this PR.
I think that there are three gemspec modifications suggested in this PR, and I suggest that we consider each individually:
Avoid git ls-files
in favour of a pure-Ruby approach
git ls-files
with the Rake::FileList
suggestion from https://stackoverflow.com/a/57341496, to grab all files except those listed in .gitignore
.require 'rake/file_list'
s.files = Rake::FileList['**/*'].exclude(*File.read('.gitignore').split)
0003-Remove-calling-git.patch
, @utkarsh2102?Exclude various files from the packaged gem
CONTRIBUTORS
, Rakefile
, HACKING
, and tests from the packaged gem.gem unpack
, would have all they need.Add rubocop-packaging
as a new development dependency
gemspec_git_spec.rb
: https://github.com/utkarsh2102/rubocop-packaging/tree/master/spec/rubocop/cop/packaging - are more coming soon?Thank you again @utkarsh2012!
Let me help give you a broad perspective on this:
Thank you again @utkarsh2012 for your help with the downstream maintenance of Debian package
sup-mail
, and thank you also for this PR.
You're welcome! :heart:
1. **Avoid `git ls-files` in favour of a pure-Ruby approach** * I can understand the logic of this, for build environments where git isn't installed. For example, Debian, which (in essence) builds from the release tar-ball.
This is of course a problem whilst maintaining this downstream. But not only that, here are some more things: (originally pointed out by David, upstream maintainer of bundler)
ruby-core
as part of their CI system runs their test suite against an unpackaged ruby tarball which doesn't have a .git
folders. That means they needed to overwrite the bundler
gemspec to not use git. Actually, not anymore since they removed git from bundler's gemspec
. See here: https://github.com/rubygems/bundler/pull/6985
If you build your application on a bare docker image without git
, and you are pointing to a git sourced gem that uses git on its gemspec, you'll get No such file or directory - git ls-files (Errno::ENOENT)
warnings all around. For example, if you use gem "sup", git: "https://github.com/sup-heliotrope/sup"
in your Gemfile.
Originally, git ls-files
inside the default gemspec template was designed so that users publishing their first gem wouldn't unintentionally publish artifacts to it. However, in this case we're not beginners anymore and sup
doesn't have a C extension or anything (I suppose!?), so the risk of anything like this happening is very low. Also note that bundler won't let you release if you have uncommitted stuff in your working repository, so the risk is even lower now.
* The minimal solution here would, I believe, be a like-for-like replacement of `git ls-files` with the `Rake::FileList` suggestion from https://stackoverflow.com/a/57341496, to grab all files except those listed in `.gitignore`.
Can do, I have been doing similar patching, so yeah, I'll amend this PR to use Rake::FileList
.
* Would this be sufficient to remove the requirement for Debian patch `0003-Remove-calling-git.patch`, @utkarsh2102?
Indeed, this is very helpful. This is a win-win! :rocket:
2. **Exclude various files from the packaged gem** * For example, remove files like `CONTRIBUTORS`, `Rakefile`, `HACKING`, and tests from the packaged gem. * I'm more on-the-fence about this. * I measured the size difference of the built .gem file, and removing these files only reduces the final gem size by 4 KB, from 255 KB to 251 KB. * I quite like that the gem is a snapshot in time, and contains all the information needed by a human - so a person with only the gem, via `gem unpack`, would have all they need.
These files are not really needed by end-users. These will be needed for development purposes or other things. The general (best) practice is to not ship anything that's not needed, really. But in case you still want to do, sure, I'll be happy to add these things back again! :sweat_smile: Generally, we should not ship tests, gemspecs, Gemfiles, Rakefiles, and other things! :smile:
3. **Add `rubocop-packaging` as a new development dependency** * This looks very hot off-the-press - congratulations on creating this! I saw the first commit from https://github.com/utkarsh2102/rubocop-packaging was only last month.
Yes, rubocop-packaging was born because we had been facing a lot of problems downstream. This not only helps Debian, but all the downstream maintainers (Fedora, etc, etc).
This project is being mentored by Antonio (the Ruby "guru" in Debian :place_of_worship:) and David (the upstream maintainer of rubygems/bundler, byebug, pry-byebug, and so many more things! :place_of_worship:)
* I saw that the only check just now is `gemspec_git_spec.rb`: https://github.com/utkarsh2102/rubocop-packaging/tree/master/spec/rubocop/cop/packaging - are more coming soon?
Of course, another cop dropping by this month (the PR is almost ready there!) and 2 more the next month! We're working out on all the problems being faced and we'd want to have them fixed in upstream itself.
I'll soon create a RubyGems guide, so it'll help enlighten and aware more upstream maintainers :smile: However, some guides on this still exist.
* I'd be more inclined to avoid this as a main development dependency, because all of our other development dependencies are very standard gems.
Ouch, well, this kinda hurt :) But fair enough since this is only a month old?
However, I'd like to point out that even if it's a month old, there's a mention of it in the known extensions: https://docs.rubocop.org/rubocop/extensions.html (where the author of RuboCop seemed to like what we're doing! :)) And even with one cop alone in one month's time, it is being used by more standard gems: https://github.com/utkarsh2102/rubocop-packaging/network/dependents
(I plan to reach out to more once we release the new cop tomorrow or day after or by max, this Friday!)
Lastly, some of the development dependencies of sup are already using Dir
or Dir.glob
to list the files instead of git
.
So they indeed aren't a problem for us anyway!!
Thank you again @utkarsh2012!
No, thanks to you both for the maintenance :heart:
Anyway, once we reach the conclusion of what's to be done here, I'll be happy to amend this PR.
One thing for sure we need here is the use of Rake::FileList
. Let me know about others.
I think it's reasonable to expect to able to evaluate the gemspec using only "pure" Ruby, without shelling out to external commands like git. It makes the build simpler in restrictive build environments like distro build systems. So I am fine with a patch which swaps out the git ls-files
invocation with an explicit list of globs, as long as the end result is that the gem still contains the complete set of source files.
As to the question of whether the gem should include all sources... in my view it absolutely should. I know some maintainers prefer to treat Rubygems as more like a "binary" distribution platform, or make some distinction between "sources needed at runtime vs. sources needed for development". But that distinction is not meaningful. For one thing, the letter and spirit of the GPL require us to distribute "complete corresponding source code" which means all the sources, including tests. And those are actually useful for downstream distributors, for example in the Fedora package which runs the tests. I think one reason for the point of difference here is that some packages treat Github as their authoritative source tarballs and Rubygems as a convenient way to install the code. But for Sup (which predates both Github and Rubygems, it was originally hosted on Rubyforge) the authoritative source tarballs are really the Rubygems. So I want to continue including all sources as before.
And I would prefer not to gain a hard dependency on the rubocop-packaging gem (and its dependency chain), at least while it's relatively new.
(edited because I bumped the button too early)
I made a bit of a mess of the previous comment, and accidentally closed the PR in the process. I didn't mean to do that. I've edited it to fill out what I was trying to say, although the PR subscribers probably got emailed a half-written version. Sorry for all the mess.
@utkarsh2102 based on my comment above, would you mind amending the PR to include just the changes to replace git ls-files
with Dir.glob
, without a new dependency on rubocop-packaging?
Thank you @utkarsh2102, and thank you @danc86!
@utkarsh2102 -
Let me help give you a broad perspective on this ...
That makes perfect sense, thank you!
sup
doesn't have a C extension or anything (I suppose!?), so the risk of anything like this happening is very low
Don't worry, there are no C extensions here. :-) However, there are some files generated during (for example) rake test
that we wouldn't want to package - all covered by .gitignore
.
Of course, another cop dropping by this month (the PR is almost ready there!) and 2 more the next month!
That's great! I wish you all the very best for rubocop-packaging
. It sounds like it solves a very real problem, and I would be very happy for you if it becomes the new standard.
Ouch, well, this kinda hurt :)
Sorry, this was not intended to hurt in any way - I wish you all the very best for the project, and I think you have some exciting momentum behind you for such a new project already. It solves a real problem, and I think it would be great if it becomes widely used. I just, with my Sup maintainer hat on, would look for it to be more established before we add it in here - Sup is 14 years old, so a 1 month project is extremely new in Sup terms! Other projects are much faster-moving than Sup.
@danc86 -
But for Sup (which predates both Github and Rubygems, it was originally hosted on Rubyforge) the authoritative source tarballs are really the Rubygems. So I want to continue including all sources as before.
Compelling reasoning - I'm convinced.
And I would prefer not to gain a hard dependency on the rubocop-packaging gem (and its dependency chain), at least while it's relatively new.
I agree on not gaining a hard dependency at this stage (fuller reasoning in earlier part of my post).
However, what about a soft dependency?
I am certainly planning to regularly run rubocop-packaging against Sup. So could we perhaps add an easy way to install the gem and execute the checks off to the side in, say, contrib
? (So not as part of our normal dev build, but saves me from some manual typing every time I want to execute the checks.)
@utkarsh2102 based on my comment above, would you mind amending the PR to include just the changes to replace
git ls-files
withDir.glob
, without a new dependency on rubocop-packaging?
I suggest perhaps Rake::FileList
over Dir.glob
, because we already have a dependency on rake, and the end code is a little neater.
If we're looking for all sources, then we'd also need to include the dot files. So we might need something along the lines of this, @utkarsh2102 (this is untried, but hopefully gives the idea):
Rake::FileList['**/{,.}*'].exclude('git').exclude(*File.read('.gitignore').split)
I am certainly planning to regularly run rubocop-packaging against Sup. So could we perhaps add an easy way to install the gem and execute the checks off to the side in, say,
contrib
? (So not as part of our normal dev build, but saves me from some manual typing every time I want to execute the checks.)
Sure, but let's take the question of how to run rubocop-packaging to a separate PR. My main issue is that a gem-level dependency will require me to get it packaged into Fedora.
I would be interested in having it hooked up to Travis though. That's even better than having to remember to run it yourself.
I suggest perhaps
Rake::FileList
overDir.glob
, because we already have a dependency on rake, and the end code is a little neater.If we're looking for all sources, then we'd also need to include the dot files. So we might need something along the lines of this, @utkarsh2102 (this is untried, but hopefully gives the idea):
Rake::FileList['**/{,.}*'].exclude('git').exclude(*File.read('.gitignore').split)
I have no strong preference between Rake::FileList
or Dir
, they seem to be equivalent aside from the lazy evaluation. Dir
has the advantage of being in the standard library so maybe we should stick with that.
There is a downside I can see with your suggestion of globbing everything and excluding files listed in .gitignore. I often leave random junk uncommitted in the root of git trees, like files called TODO fix the thing
with notes about stuff I was in the middle of, or incomplete test modules that aren't added yet, etc. They aren't in .gitignore because they aren't build products, they are just half-finished stuff, and I want them to show up when I do git status
so I remember they are there. Right now my git tree has an uncommitted mbox file for testing, a xapian index, an uncommitted test fixture email, and a file of notes.
There is a very good chance I would accidentally publish a gem containing stuff like that, if the gemspec was globbing for all files.
With @utkarsh2102 's current approach, using Dir
to explicitly name the base directories like bin
, lib
, etc there is much less chance I will accidentally build a gem containing files that aren't committed to git.
Also (a more theoretical concern) .gitignore can contain syntax beyond just plain globs that Rake::FileList
can understand, like regexp ignore patterns. I think trying to parse it ourselves in the gemspec might be unwise.
Good points, @danc86 - how about another option? We could create a git pre-commit hook that runs git ls-files
every commit, and commits the output to a file (e.g., file-list). Then read the file-list in pure Ruby. Confirmed fully compatible with the existing git ls-files approach
!
(I steered away from a git pre-commit hook, because they run locally, and it is a pain to ensure that everyone clones the repository using the same hooks.)
After further research, I've found that we used to have an authoritative list of filenames inside Manifest.txt
- until 2009 (removed in commit 969061797a).
There are so many edge cases here with file globbing includes/excludes that I suggest that we instead simply re-add Manifest.txt
, with its contents defined as the list of files managed by git (i.e., contents should always match git ls-files
). Then change this PR to read Manifest.txt
in pure Ruby.
We can ensure that Manifest.txt
remains in sync with git ls-files
via adding a new automatic Travis check.
I will raise a new (prerequisite) PR today to recreate Manifest.txt
. If @danc86 approves this other PR, then after that, please could you update this PR to use Manifest.txt
, @utkarsh2102 ? (I will tag you in a new comment here if/when ready.)
It's better to Rake task do this. It's better and simpler this way. I can put this together if you want me to.
It's better to Rake task do this. It's better and simpler this way. I can put this together if you want me to.
Thanks! We're thinking on exactly the same lines. :-) I'm halfway through adding a Rake task to generate Manifest.txt
. (And the check of Manifest.txt
against git ls-files
is another Rake task.)
I have worked on such a task before, d'you want me to paste the diff here?
Heh, great. Can fix this PR tomorrow (and also once that is merged!).
There are so many edge cases here with file globbing includes/excludes that I suggest that we instead simply re-add
Manifest.txt
, with its contents defined as the list of files managed by git (i.e., contents should always match git ls-files) ... I will raise a new (prerequisite) PR today to recreate Manifest.txt. If @danc86 approves this other PR, then after that, please could you update this PR to useManifest.txt
, @utkarsh2102 ?
PR #581
Thanks @danc86 for approving PR #581. Now merged, and we have a Manifest.txt
.
@utkarsh2102 - Over to you to modify this PR please, should be a one-liner change now to read Manifest.txt
instead of calling git ls-files
. :-)
(Thanks again, @utkarsh2102 & @danc86!)
I am certainly planning to regularly run rubocop-packaging against Sup. So could we perhaps add an easy way to install the gem and execute the checks off to the side in, say, contrib? (So not as part of our normal dev build, but saves me from some manual typing every time I want to execute the checks.)
Sure, but let's take the question of how to run rubocop-packaging to a separate PR. My main issue is that a gem-level dependency will require me to get it packaged into Fedora.
I would be interested in having it hooked up to Travis though. That's even better than having to remember to run it yourself.
Raised new PR #582 for the rubocop-packaging
checks.
This installs rubocop-packaging
off to the side, within a subdirectory of contrib
, so avoiding a hard dependency on the gem. But it is also hooked up to Travis CI, to run the rubocop-packaging
checks automatically.
Hi @IPv2 and @danc86,
Thanks for the active maintenance of
sup
again! :heart: However, while maintaining this in Debian, we found thatsup
relies ongit
to list the files which could be done via pure Ruby alternative -- which is what this PR does.As an addition, this PR makes sure that this gem only ships the required files to the end-users and not other things which are not needed by them! :rocket:
Also, added
rubocop-packaging
as adevelopment_dependency
which will ensure the best practices (and therefore help us in the Debian maintenance!) Here's what it shows us:And this PR fixes the same, which helps us in shipping this in Debian! :tada: Hope this would make sense and you'll be open to this change :100:
Signed-off-by: Utkarsh Gupta <utkarsh@debian.org>