hexpm / specifications

Specifications for using and implementing Hex protocols
49 stars 26 forks source link

Proposal: Add cross-repository dependencies #5

Closed ericmj closed 8 years ago

ericmj commented 8 years ago

In Mix specifying a dependency from another repository will look like this:

@company_repo "https://repo.example.com"
defp deps do
  [{:ecto, "~> 2.0", source: @company_repo}]
end

When the :source key is not set the dependency will be fetched from the same repository the parent was fetched. Top-level projects will still default to be fetched from hex.pm.

When declaring a dependency on a hex.pm package from a package located in another repository the :hexpm key should be used. Example:

defp deps do
  [{:ecto, "~> 2.0", source: :hexpm}]
end

This is because the hex.pm repository is special-cased in configuration, HEX_MIRROR will only change the URL for hex.pm and so that we can change the URL to hex.pm in the future without breaking existing projects. Question: Should we special case hex.pm in the specification as well to ensure existing packages do not break?

If you need to override a package with a package from another repository you should use override: true. As an example:

defp deps do
  [{:ecto, "~> 2.0", source: @company_repo, override: true}]
end

Question: An alternative to :source being a string with a URL, it could be an atom key where the URL is set in your config. Example:

# mix.exs
defp deps do
  [{:ecto, "~> 2.0", source: :my_company}]
end
% hex.config
{repos, #{my_company => "https://repo.example.com"}}.
danhper commented 8 years ago

@ericmj Great, thank you very much.

When the :source key is not set the dependency will be fetched from the same repository the parent was fetched. Top-level projects will still default to be fetched from hex.pm.

I would keep the same default for everything, even if the parent was fetched from somewhere else:

I do not think it is good that the behavior changes depending on whether the project is top level or not. In a CI environment, even projects used as dependencies would be considered top level, so that would force to have the :source for all dependencies to work in both cases.

On the other hand, some companies might want to fetch dependencies only from their servers, so having something like a default_repo key that can be set in the configuration and that would default to hexpm might help keeping things clean.

Question: Should we special case hex.pm in the specification as well to ensure existing packages do not break?

Sorry, what do you mean by special case here? Being the only repo that can be set using an environment variable? I think that if we adopt a repos keys in hex.config, we should rather encourage users to set the mirror from there.

An alternative to :source being a string with a URL, it could be an atom key where the URL is set in your config.

I would rather go with the atom and a global config approach for two reasons:

  1. This config will most likely be shared by a potentially large number of project, so having a centralized place for it seems simpler
  2. i think per repository credentials will be a must, and it seems simpler to keep the repository settings and credentials grouped together, for example something like:

    {repos, #{my_company => #{url => "https://repo.example.com", key => "a_key_used_for_authentication"}}}.
ericmj commented 8 years ago

I would keep the same default for everything, even if the parent was fetched from somewhere else:

I removed defaults from the spec to make it more clear, I pushed a new commit where we can use three different values for source in the registry. primary for fetching from hex.pm (clients should default to this like you said), self for fetching from the same repository where the registry is located, and a custom string of course. We need self because we don't want packages to break in the future if the repository URL changes because the company changed name for example.

On the other hand, some companies might want to fetch dependencies only from their servers, so having something like a default_repo key that can be set in the configuration and that would default to hexpm might help keeping things clean.

That's what hex_mirror config and HEX_MIRROR env var is for.

Sorry, what do you mean by special case here? Being the only repo that can be set using an environment variable?

I chose to not special case hex.pm in the specification, I use primary for it now. Mix should still special case hex.pm I think because it touches many places, for example the env var, public keys and how they work together.

I would rather go with the atom and a global config approach for two reasons:

We can do that in the future if needed (it's likely needed) but for the first iteration lets use simple module attributes?

danhper commented 8 years ago

I removed defaults from the spec to make it more clear, I pushed a new commit where we can use three different values for source in the registry. primary for fetching from hex.pm (clients should default to this like you said)

I like this approach!

We need self because we don't want packages to break in the future if the repository URL changes because the company changed name for example.

How would the :self key behave if we want to test a dependency in a CI environment?

For example, let's say we have the three projects, where both dependencies are hosted on the company hex repository.

# top_level repository
defmodule TopLevel.MixFile do
  defp deps do
    [{:my_dep, source: "https://my.company"}]
  end
end

# top_level repository dependency
defmodule MyDep.MixFile do
  defp deps do
    [{:nested_dep, source: :self}]
  end
end

# nested dependency
defmodule NestedDep.MixFile do
end

This would work if we install everything from TopLevel, but if I want to run my tests for MyDep in a CI, which I think is a pretty common case, the use of the :self key does not seem possible. Am I missing something here? If not, maybe we could keep this approach and allow :self value to be changed through an environment variable for such a case, although this can I think be kept for later.

We can do that in the future if needed (it's likely needed) but for the first iteration lets use simple module attributes?

I agree it is better to keep things simple for the first iteration. Maybe we can rediscuss this when tackling the authentication process.

ericmj commented 8 years ago

Am I missing something here?

No, you are not, I wasn't explaining it correctly. :self is only used to guard against future changes of repository URLs and it's only used in the registry so it will never be specified in the mix.exs file.

Imagine the scenario where your company has the repository https://awesome.startup. You would specify your dependencies like this:

# top_level repository
defmodule TopLevel.MixFile do
  defp deps do
    [{:my_dep, source: "https://awesome.startup"}]
  end
end

# top_level repository dependency
defmodule MyDep.MixFile do
  defp deps do
    [{:nested_dep, source: "https://awesome.startup"}]
  end
end

You get bought by google and change URL to https://awesome.google. You update your projects to reflect this:

# top_level repository
defmodule TopLevel.MixFile do
  defp deps do
    [{:my_dep, source: "https://startup.google"}]
  end
end

# top_level repository dependency
defmodule MyDep.MixFile do
  defp deps do
    [{:nested_dep, source: "https://startup.google"}]
  end
end

Older versions of the packages will still point to https://awesome.startup, both in the package metadata and in the registry. We can't change the package metadata because packages are immutable and locked checksums will break. We could update the registry but we are very likely going to move to an immutable data structure for the registry so you would have to delete all packages and re-add them. It's much easier to just mark them as :self in the registry and you get the repository URL change for free and a more efficient registry because :self is a single byte instead of the full registry URL.

danhper commented 8 years ago

@ericmj Thanks for the explanation, I just understood self was meant to be used in the registry and not in the MixFile.

The proposal as is should cover all my current use cases, I cannot wait to give it a try!

Please let me know if there is anything I can help with :smiley:

ericmj commented 8 years ago

@tuvistavie If you'd like you can start on any part of it. There will be a while until I have time to work on it.

danhper commented 8 years ago

@ericmj Sure, I am going to try to see how the client could support P, S and custom sources to start with.

ericmj commented 8 years ago

@tuvistavie Have you started working on this? If not, I will grab it sometime next week since I would like to have this feature out soon.

danhper commented 8 years ago

@ericmj I didn't get time to work on it yet, sorry!

ericmj commented 8 years ago

I am thinking we can solve much of this with namespaces instead. If you actually want to use packages from another repository then that repository can proxy the rest of the packages from hex.pm.

Another solution would be to use priority lists of repositories like rebar3 is planning to do.

danhper commented 8 years ago

I agree namespace is a good solution, and I personally like npm approach for this. Adding

@myscope:registry=https://my.npm.registry

to npm configuration will make all packages with @myscope/ prefix be fetched from https://my.npm.registry while all other packages will be fetched from npm as usual.

I would say the major advantage of this approach vs priority list is that it is much more explicit, and we can be sure what is fetched from where.

One disadvantage is that we cannot override a package by publishing to a private repository, although I think it is not a good practice.

ericmj commented 8 years ago

You can also always override with the explicit override: true configuration on the dependency.