Closed rarkins closed 4 years ago
I'm still not 100% sure that we can't do the parsing in JS. Are there any more necessary combinations than these?
# These gems are in the :default group
gem 'nokogiri'
gem 'sinatra'
gem 'wirble', group: :development
group :test do
gem 'faker'
gem 'rspec'
end
group :test, :development do
gem 'capybara'
gem 'rspec-rails'
end
gem 'cucumber', group: [:cucumber, :test]
Source: https://bundler.io/guides/groups.html#how-to-manage-groups-of-gems
We also need to extract project metadata like:
ruby '1.9.3', :engine => 'jruby', :engine_version => '1.6.7'
And git dependencies like:
gem 'nokogiri', :git => 'https://github.com/tenderlove/nokogiri.git', :branch => '1.4'
And custom sources:
gem 'my_gem', '1.0', :source => 'https://gems.example.com'
source 'https://gems.example.com' do
gem 'my_gem', '1.0'
gem 'another_gem', '1.2.1'
end
I like idea with one interface for managers cause it means that maintain them will be easier.
Hm, I'm pretty sure that there's no good reasons to do, but I saw a lot of Gemfile's like above)
We definitely can, but IMHO this will be much more harder way and much less stable than just use bundler. I'm worrying about this cause it's pure ruby code and in this language really easy achieve same results in different ways.
The key thing is that after parsing the file, we need a way to be able to uniquely identify the location of each package definition.
The easiest would be if we can include the line number somehow.
Otherwise we need to be able to trust that we can find each one by following it’s metadata, eg groups -> name. If groups can be duplicated (define test group twice for instance) then it makes this option much harder.
I think I haven't explained the problem well enough, so I'll try again.
At a high level, here's how it all works:
So the extract function says "here's a list of all the dependencies and any important metadata about them, such as a specific source".
And the Lookup/decide functions say "there is one patch update available", e.g. from v1.0.0 to v1.0.1.
So then the challenge is, how can the Bundler update function know exactly where in the file to apply the change, if groups can be defined twice, dependencies can be defined twice, etc?
If we stored the exact line number, then it's super easy. But it's maybe not possible to get that from Bundler.
Instead, this might be a way to do it, similar to what I've done for JSON and TOML:
parse the current file contents
update the parsed content with the new patch, save as "expectedContent" # now we know what the final result should look like
let success = false;
while (!success) do:
- find next instance of 'depName'
- replace next string found with new version
- if parsed(new content) === expectedContent, then success = true
Another attempt to describe it: it's not enough to simply extract a list of dependencies from Gemfile - we must be able to uniquely identify their location in the Gemfile too, so that we know where to apply updates to.
By the way, there has been a recent attempt to parse Gemfiles here: https://github.com/dev-cprice/gemfile-parser/issues/2#issuecomment-448829359
A question I'm trying to answer is: are there any valid cases where the gem name and version would not be on the same line?
A critical point is that we don't need to understand all the file - just enough to extract groups, sources and gems.
It seems like an initial version - that would be ignorant of groups and sources specific to certain gems - would just be to parse line by line for the "gem" pattern. That way we can identify the line number for each and the implementation is simple. Then we work on a far more advanced approach that more properly parses and understands group names and source groupings.
I think the Ruby on Rails Gemfile is known as one of the most complex: https://github.com/rails/rails/blob/master/Gemfile
However it has only one source
definition so it's not the perfect complex example.
Do platforms
influence any of the results we should return? e.g.
platforms :jruby do
Also this is an interesting pattern:
gem "listen", ">= 3.0.5", "< 3.2", require: false
We need to think how we parse and store that type of complex value. It's a pity it's not written as ">= 3.0.5, < 3.2"
. @sleekybadger does our Ruby versioning support any "and" concepts or would we need to pass both these values through it separately?
If groups can be duplicated (define test group twice for instance) then it makes this option much harder.
They can be
Do platforms influence any of the results we should return? e.g.
Yes. Some gems are not compatible with JRuby or only compatible with JRuby.
does our Ruby versioning support any "and" concepts or would we need to pass both these values through it separately?
">= 3.0.5", "< 3.2"
is an "and". Both version constraints must be satisfied. Idiomatically, this is usually written when the developers have already identified that 3.2 breaks compatibility and should not be upgraded to casually.
Like-wise, sometimes a specific version broke compatibility but future versions are OK. Or, in a case I had recently, I wanted to exclude newer versions I knew to have a security vulnerability:
# `rack` just listed here to exclude vulnerable versions; not actually needed
# as to-plevel dependency#
gem 'rack', '~> 2.0', '!= 2.0.4', '!= 2.0.5'
I'm still not 100% sure that we can't do the parsing in JS.
It will be increasingly difficult to do this well. Ruby is notoriously flexible in its syntax and a Gemfile
is 100% Ruby, not a subset or a data format of any kind.
The clauses in a Gemfile
are actually method calls:
ruby '1.9.3', :engine => 'jruby', :engine_version => '1.6.7'
# same as:
ruby("1.9.3", {
engine: "jruby",
engine_version: "1.6.7"
})
# same as:
ruby(
"1.9.3",
:engine => "jruby",
:engine_version => "1.6.7"
)
# same as:
ruby "1.9.3",
:engine => "jruby",
:engine_version => "1.6.7"
# same as:
ruby \
"1.9.3",
:engine => "jruby",
engine_version: '1.6.7' # kwarg style can be mix-and-matched, as can quoting
Bundler exposes its functionality as a library, not just a command-line tool, so I think writing a very small shim that wraps Bundler's Gemfile parsing and outputs a JSON data representation which the Javascript can consume may be worth exploring.
Alternatively, you could write a very small Ruby library which could understand (most) Gemfile
s simply by defining the various top-level methods (def gem(...)
, def ruby(...)
, etc). That would let you capture the arguments passed as data and store it elsewhere.
Trying to parse this file in Javascript without a mature Ruby language parser might end up being a wild goose chase. Ruby is notoriously flexible
$ pry -r bundler
[1] pry(main)> Bundler::Definition.build('./Gemfile', './Gemfile.lock', false)
=> #<Bundler::Definition:0x00007f8f4fa6f410
@dependencies=
[Gem::Dependency.new("bootsnap", Gem::Requirement.new(["~> 1.3"]), :runtime),
Gem::Dependency.new("dotenv-rails", Gem::Requirement.new(["~> 2.6"]), :runtime),
Gem::Dependency.new("rails", Gem::Requirement.new(["~> 5.2.2"]), :runtime),
Gem::Dependency.new("pg", Gem::Requirement.new(["~> 1.1"]), :runtime),
Gem::Dependency.new("puma", Gem::Requirement.new(["~> 3.12"]), :runtime),
Gem::Dependency.new("rack-cors", Gem::Requirement.new(["~> 1.0"]), :runtime),
...
However, if you did the approach where you re-define the Gemfile DSL, you could capture line locations:
# pseudo-gemfile.rb
def git_source(*)
# ignore, for now
end
def source(*)
# ignore, for now
end
def ruby(*)
# ignore, for now
end
def gem(name, *versions, **_options)
caller = caller_locations(1..1).first
caller = [caller.absolute_path, caller.lineno]
puts({ name: name, versions: versions, defined_at: caller }.to_json)
end
def group(*)
yield if block_given?
end
$ ruby -r json -r ./pseudo-gemfile.rb Gemfile
{"name":"bootsnap","versions":["~> 1.3"],"defined_at":["/Users/bjeanes/Code/covidence/Gemfile",15]}
{"name":"dotenv-rails","versions":["~> 2.6"],"defined_at":["/Users/bjeanes/Code/covidence/Gemfile",16]}
{"name":"rails","versions":["~> 5.2.2"],"defined_at":["/Users/bjeanes/Code/covidence/Gemfile",19]}
{"name":"pg","versions":["~> 1.1"],"defined_at":["/Users/bjeanes/Code/covidence/Gemfile",22]}
...
Obviously, this has to be in a sandbox because you are executing the Gemfile here, but it does open some doors.
EDIT: This also doesn't factor in Gemfile.lock to tell you actual versions, just requested ranges.
@rarkins i'm agree with @bjeanes cause when I dig into the code and start testing this with various real Gemfiles a lot of edge cases came up and I think will be easier just treat Gemfile as ruby code than to support them all.
@bjeanes I think we can use bundler API to simplify things with something like this:
# parser.rb
require 'json'
require 'bundler'
class RenovateDsl < Bundler::Dsl
def gem(*args)
*rest, dependency = super
locations = caller_locations(1..1)
defined_at = locations.first.lineno
definitions[dependency.name] = defined_at
end
def definitions
@definitions ||= {}
end
end
dsl = RenovateDsl.new
dsl.eval_gemfile('Gemfile')
dsl.to_definition('Gemfile.lock', {})
grouped = dsl.dependencies.each_with_object({}) do |dependency, accumulator|
dependency.groups.each_with_object(accumulator) do |group, accumulator|
accumulator[group] ||= []
accumulator[group] << {
name: dependency.name,
version: dependency.requirements_list.map(&:to_s).join(', '),
defined_at: dsl.definitions[dependency.name]
}
end
end
puts JSON.pretty_generate(grouped)
# Gemfile.rb
ruby '2.5.3'
source 'https://rubygems.org'
git_source(:github) { |name| "https://github.com/#{name}.git" }
gem 'mailjet'
gem 'colorize'
group :production, :staging do
gem 'sdoc', '~>0.4.0'
end
group :development, :test, :staging do
gem 'faker'
end
group :development, :test do
gem 'byebug'
gem 'ruby-prof', '>= 0.17.0', '< 1.0.0'
end
group :development do
gem 'bullet'
gem 'letter_opener'
end
group :development do
gem 'rack'
end
$ ruby parser.rb
{
"default": [
{
"name": "mailjet",
"version": ">= 0",
"defined_at": 6
},
{
"name": "colorize",
"version": ">= 0",
"defined_at": 7
}
],
"production": [
{
"name": "sdoc",
"version": "~> 0.4.0",
"defined_at": 10
}
],
"staging": [
{
"name": "sdoc",
"version": "~> 0.4.0",
"defined_at": 10
},
{
"name": "faker",
"version": ">= 0",
"defined_at": 14
}
],
"development": [
{
"name": "faker",
"version": ">= 0",
"defined_at": 14
},
{
"name": "byebug",
"version": ">= 0",
"defined_at": 18
},
{
"name": "ruby-prof",
"version": "< 1.0.0, >= 0.17.0",
"defined_at": 19
},
{
"name": "bullet",
"version": ">= 0",
"defined_at": 23
},
{
"name": "letter_opener",
"version": ">= 0",
"defined_at": 24
},
{
"name": "rack",
"version": ">= 0",
"defined_at": 28
}
],
"test": [
{
"name": "faker",
"version": ">= 0",
"defined_at": 14
},
{
"name": "byebug",
"version": ">= 0",
"defined_at": 18
},
{
"name": "ruby-prof",
"version": "< 1.0.0, >= 0.17.0",
"defined_at": 19
}
]
}
If parsing, the challenging is being able to deterministically locate the right dependency to update it later. The part which I found challenging is that it seems there's nothing stopping you (a) reusing group names in a Gemfile more than once, and (b) referencing the same dependency more than once.
Obviously using line-based parsing, it's really simple. We record the line it's on, and when it comes time to update the dependency we double check if the right gem is still on that line.
With any ruby parsing, we can only do it once we have a plan for how to update/patch it again later, e.g. we need a "path" to locate the dependency in the Gemfile, similar to the concept of xpath in XML.
@sleekybadger the above group sorting output is probably not a great idea because it results in dependencies being listed multiple times in the resulting output. The output needs to be closer to what's currently supported by the JS-based parsing, e.g. if there are 20 gem
definitions in the Gemfile
then you get 20 objects returned.
What does the "raw" Bundler output look like prior to manipulation?
@rarkins we can change grouping behaviour pretty easy. It looks like this:
[
"mailjet (>= 0)",
"colorize (>= 0)",
"sdoc (~> 0.4.0)",
"faker (>= 0)",
"byebug (>= 0)",
"ruby-prof (< 1.0.0, >= 0.17.0)",
"bullet (>= 0)",
"letter_opener (>= 0)",
"rack (>= 0)"
]
So what would be the "unique path" we use to locate a dependency within the Gemfile, i.e. when it comes time to update it?
Can we use line number? We can easily track duplicated gems with such approach:
class RenovateDsl < Bundler::Dsl
def gem(*args)
*rest, dependency = super
locations = caller_locations(1..1)
defined_at = locations.first.lineno
definitions[dependency.name] ||= []
definitions[dependency.name] << defined_at
end
def definitions
@definitions ||= {}
end
end
ruby '2.5.3'
source 'https://rubygems.org'
git_source(:github) { |name| "https://github.com/#{name}.git" }
gem 'mailjet'
gem 'colorize'
group :production, :staging do
gem 'sdoc', '~>0.4.0'
end
group :development, :test, :staging do
gem 'faker'
end
group :development, :test do
gem 'byebug'
gem 'ruby-prof', '>= 0.17.0', '< 1.0.0'
end
group :development do
gem 'bullet'
gem 'letter_opener'
end
group :development do
gem 'rack'
gem 'faker'
end
{
"default": [
{
"name": "mailjet",
"version": ">= 0",
"defined_at": [
6
]
},
{
"name": "colorize",
"version": ">= 0",
"defined_at": [
7
]
}
],
"production": [
{
"name": "sdoc",
"version": "~> 0.4.0",
"defined_at": [
10
]
}
],
"staging": [
{
"name": "sdoc",
"version": "~> 0.4.0",
"defined_at": [
10
]
},
{
"name": "faker",
"version": ">= 0",
"defined_at": [
14,
29
]
}
],
"development": [
{
"name": "faker",
"version": ">= 0",
"defined_at": [
14,
29
]
},
{
"name": "byebug",
"version": ">= 0",
"defined_at": [
18
]
},
{
"name": "ruby-prof",
"version": "< 1.0.0, >= 0.17.0",
"defined_at": [
19
]
},
{
"name": "bullet",
"version": ">= 0",
"defined_at": [
23
]
},
{
"name": "letter_opener",
"version": ">= 0",
"defined_at": [
24
]
},
{
"name": "rack",
"version": ">= 0",
"defined_at": [
28
]
},
{
"name": "faker",
"version": ">= 0",
"defined_at": [
14,
29
]
}
],
"test": [
{
"name": "faker",
"version": ">= 0",
"defined_at": [
14,
29
]
},
{
"name": "byebug",
"version": ">= 0",
"defined_at": [
18
]
},
{
"name": "ruby-prof",
"version": "< 1.0.0, >= 0.17.0",
"defined_at": [
19
]
}
]
}
@sleekybadger yes, that would probably work well. I think I would want to de-dupe it though so that it's converted into an array of dependencies with zero or more groups each, like the JS version produces. i.e. in the results, number of "deps" matches with the count of gem
lines.
@rarkins like this?
[
{
"name": "mailjet",
"groups": [
"default"
],
"version": ">= 0",
"defined_at": [
6
]
},
{
"name": "colorize",
"groups": [
"default"
],
"version": ">= 0",
"defined_at": [
7
]
},
{
"name": "sdoc",
"groups": [
"production",
"staging"
],
"version": "~> 0.4.0",
"defined_at": [
10
]
},
{
"name": "faker",
"groups": [
"development",
"test",
"staging"
],
"version": ">= 0",
"defined_at": [
14,
29
]
},
{
"name": "byebug",
"groups": [
"development",
"test"
],
"version": ">= 0",
"defined_at": [
18
]
},
{
"name": "ruby-prof",
"groups": [
"development",
"test"
],
"version": "< 1.0.0, >= 0.17.0",
"defined_at": [
19
]
},
{
"name": "bullet",
"groups": [
"development"
],
"version": ">= 0",
"defined_at": [
23
]
},
{
"name": "letter_opener",
"groups": [
"development"
],
"version": ">= 0",
"defined_at": [
24
]
},
{
"name": "rack",
"groups": [
"development"
],
"version": ">= 0",
"defined_at": [
28
]
},
{
"name": "faker",
"groups": [
"development"
],
"version": ">= 0",
"defined_at": [
14,
29
]
}
]
@sleekybadger yes, that's pretty much what we need. Did you generate it from a script, or by hand?
@rarkins from script. I think we need to move this script into separate repo, wrap as gem and distribute via rubygems cause this will give as ability to easily support multiple versions of ruby/bundler.
@sleekybadger don't publish anything yet, it might be simpler to do this using Docker and simply copying the script in. We'll already need to support custom versions using Docker anyway
By the way, we also need to extract any ruby and platform requirements from the file too. Can you create a PR with your script replacing the current JS-based parsing? For now, assume that a valid Ruby/Bundler combination has been installed locally. I'll handle the Docker part later.
@rarkins I didn't plan to) but I don't think that using script directly is a good idea cause it depends on bundler API. I implemented this in such way with idea that this give us some guarantees that we're working with valid for bundler Gemfile. So converting this into a gem will give us ability to support multiple bundler versions.
Can we assume Bundler 2.x to begin with and only move to a more complex approach if necessary? After all this is only being run by Renovate internally so if Bundler 2 can parse all reasonably modern Bundler 1.x files then I think we’re fine.
@rarkins sure, so I need to change extract function and add ability to parse ruby version, platforms, sources?
Yes that’s right. As per the existing JS structure but adding in Ruby version
Useful article: https://medium.com/@0xcolby/how-does-bundle-install-work-part-1-87a4349c192a
It says that a gem can’t be required twice with different requirements, so that should mean we are free to “deduce” gems if we want (eg to merge all groups together)
It would also be useful if each of the extracted dependencies can also have a “lockedVersion” value which is based on Gemfile.lock
. Eg currentValue might be ~> 2.5.0
but lockedVersion would be 2.5.7
I believe that main purpose two have condition inside Gemfile is to have gems with different sources, what do you think will be the best way to handle this?
We can easily add this, but I need your help with accessing Gemfile.lock insideextractPackageFile
function 😃
We can add that the lock file extraction later. It's definitely not essential on day 1
@bjeanes btw, you should see a Machinist PR in your master issue now. It's not been raised yet because of scheduling but should hopefully generate lock file without error this time
(responding to your comment in #3098)
Yep I see it! I have a few thoughts based on how the current version of Bundler integration is operating with my Gemfile.
Gemfile
doesn't seem right. My existing requirement is ~> 1.0, < 2
because I happen to know that v2 is a breaking change (and this inhibits bundle update
from attempting to update it). I always intended to close the PR Renovate opened as Machinist was abandoned (thanks to me, ironically) right after v2.0.0 came out. That being said, this is a good test case for Renovate, as updating the version to ~> 1.0, < 3
doesn't make sense. I think it should be consistent with how Renovate works with NPM dependencies and just set this value to the newly updated value (e.g. 2.0.0
, in this instance)rubocop
):
I think it should be consistent with how Renovate works with NPM dependencies and just set this value to the newly updated value (e.g. 2.0.0, in this instance)
Actually this is exactly as it works for npm too and working as intended. If there is a complex range then our default is to extend it. Kind of, "if you wrote it weirdly, you probably intended it and we'll keep it weird". The PR proposed has the desired effect of not removing the lower range while increasing the upper range.
However, isn't your existing syntax unnecessarily complex? Doesn't ~> 1.0
already imply <2
so that second part is redundant? You would get a different PR if you removed the second part of the requirement (i.e. had only ~> 1.0
).
I have other stale dependencies which are not being detected
Without seeing your Gemfile, I'm guessing that this is because it's a lockfile-only update, which we are not covering in this initial phase. i.e. you have a range in Gemfile for rubycop which is still satisfied and hence no work to do in the Gemfile. This functionality will follow along right after the initial one is GA, and has nothing really to do with JS vs Ruby parsing.
Hmm OK.
I need to digest that a bit. It doesn't seem right.
More importantly, that Machinist PR doesn't do anything useful. It increases the range, but it doesn't actually update the gem, so the tests pass because they are still running on old version. The PR does change Gemfile.lock
but it only changes it to reflect new requested range, not an actual new version.
Do you know if that's a Bundler "feature"? e.g. if you change the range but the existing locked version is still within the range, then the lockfile doesn't update the resolved version?
It will only change resolved version if the range is not satisfied or you
explicitly asked it to with bundle update
On Thu., 24 Jan. 2019, 2:05 am Rhys Arkins <notifications@github.com wrote:
Do you know if that's a Bundler "feature"? e.g. if you change the range but the existing locked version is still within the range, then the lockfile doesn't update the resolved version?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/renovatebot/renovate/issues/2983#issuecomment-456835815, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAKAA3RYEBrh5kWz5EpEc5w_vYGcweEks5vGHo9gaJpZM4Zatgc .
@rarkins - I just found this tool today and started playing around with some of my own Gemfile
s. I actually planned on creating something very similar to renovate (for ruby apps) before I did some googling, and I'm happy to find something out there in the wild already, so kudos!
My question, after scanning this issue relatively quickly, is why parsing the Gemfile
is even necessary? For instance, when thinking through this problem, I was planning to use Bundler to do all of the heavy lifting, specifically using bundler outdated
An example for one of my own apps:
bundle outdated --only-explicit --strict --parseable
factory_girl (newest 4.9.0, installed 4.4.0)
pry (newest 0.12.2, installed 0.11.3, requested ~> 0.11)
I then planned on iterating over each of these, creating a new branch, and running:
bundle update <GEM_NAME>
This workflow works for me because I don't necessarily want something to push up PRs that are outside the versions defined in the Gemfile
, therefore I don't need to modify the Gemfile
in an automated way, just the Gemfile.lock
. Is that where our philosophy differs? Is renovate ideally pushing up PRs that update the version beyond what's defined in the Gemfile
?
@tldev FYI I just finished fixing a few bugs in current Bundler support a few minutes ago so you may find improved behaviour if you retry anything.
Renovate's designed to handle updates in either the main package file (e.g. Gemfile) or lock files, although for Bundler it's still Gemfile-only right now. We have lockfile updating in npm/yarn already, Bundler will be soon. If you're building an app, there's no reason to use ranges once you have an automation tool available, and you benefit from being about to look in your Gemfile and know which version you have.
I saw on the todo list that Ruby is supported only with Gemfiles but most ruby projects are using a gemspec :(
@noraj off-topic. Gemspec is a different file format. Raise a detailed feature request if you don’t find any existing.
I'm sorry, I thought the issue was talking about extracting data from gemfile/gemspec in order to be able to update gemfile/gemspec and the gemfile.lock in both cases.
I was just saying that currently only gemfile is being updated and not gemspec and it is already identified in your todo list as it is even written on the documentation.
https://docs.renovatebot.com/ruby/
A typical gemfile when gemspec is used.
source 'https://rubygems.org'
# Specify your gem's dependencies in .gemspec
gemspec
@rarkins Sorry if I'm still off-topic I may have misunderstood the issue goal.
Renovate's text-based parsing is working well and our regex manager is a good fallback for anyone with an abnormal Bundler config, so closing this wontfix
.
The complexity of Gemfile/gemspec probably means we should use Ruby to parse it - perhaps Bundler itself. Here are the complexities:
depType
, it means we need to convert that field into adepTypes
array field first (for all managers), or support both depType and depTypese.g.