kwilczynski / ruby-magic

Simple interface to libmagic for Ruby Programming Language
Apache License 2.0
27 stars 8 forks source link

Allow use of a custom mirror for the libmagic tarball #8

Closed yorickpeterse closed 3 years ago

yorickpeterse commented 3 years ago

This Gem currently downloads libmagic from a fixed known location. If a service (i.e. GitLab) does this frequently enough, it may have a negative impact on the website hosting the source code.

To handle this, being able to set a custom download URL would be an option (e.g. through an environment variable). This way one doesn't need to use a custom fork (e.g. like this).

kwilczynski commented 3 years ago

Hi @YorickPeterse, sorry for the troubles!

This is indeed a bit of a problem - previously, I relied on Travis CI cache so the tarball would have been downloaded once when I added new version.

I have uploaded the archive to an S3 bucket as per: https://github.com/kwilczynski/ruby-magic/pull/3#issuecomment-806413233

Hopefully, this will alleviate some of the issues folks are having at the moment. Albeit, it might also incur financial cost if now a lot of folks are building this Ruby Gem. So it might be a somewhat bad idea. :-)

I do wonder how Nokogiri manages to keep the number of downloads down, given that they vendor dependencies too.

Re: using environment variable - not a bad idea!

Krzysztof

yorickpeterse commented 3 years ago

I suspect Nokogiri receives some sort of sponsoring to cover their costs. Or maybe they don't, and just take it :\

Would it be possible to simply include the tarball in the Gem itself, and download it at package time (or just have it in the repository)? Looking at the license it seems permissive enough, but I could be mistaken.

yorickpeterse commented 3 years ago

FYI at GitLab we'll see if we can take a look at the legal implications of including the library into the Gem. Not sure how long it would take to get some insight into this, but I'll update this issue when I know more :smile:

kwilczynski commented 3 years ago

Hi @YorickPeterse, that would be OK.

I had a look at the license and it seems it would be OK. Even though it's a proprietary license, it seems very permissive, indeed, and given that libmagic and file (as utility) is quite ever-present, then I would wager it would be OK.

We can always ask @zoulasc (Christos is the current maintainer and also a very nice person) whether he would be OK with it?

Not sure if you are reading anything GitHub (given that https://github.com/file/file is primarily a mirror), but if you do, then what do you think, @zoulasc?

Krzysztof

yorickpeterse commented 3 years ago

GitLab issue where we are discussing this: https://gitlab.com/gitlab-com/legal-and-compliance/-/issues/453

yorickpeterse commented 3 years ago

Per https://gitlab.com/gitlab-com/legal-and-compliance/-/issues/453#note_538677900 it should be fine to include the source code in this repository, removing the need for downloading it entirely.

yorickpeterse commented 3 years ago

cc @stanhu

zoulasc commented 3 years ago

I don't have a problem with it as long you keep the license that comes with it. Remember there is not such thing as a free lunch: by making a copy you are now responsible for keeping the copy up-to-date..

kwilczynski commented 3 years ago

Hi @zoulasc,

Thank you!

I have also added a long overdue copyright and license notice to the NOTICE file, see: NOTICE#L468-L505.

Krzysztof

stanhu commented 3 years ago

I see a couple of directions we could go here that might improve the reliability of builds:

  1. Improve mini_portile2 to retry with alternative mirrors.
  2. Vendor libmagic with the source files, not the .tar.gz.
  3. Use a git submodule against the GitHub mirror of https://github.com/file/file.
  4. Use a release tag from https://github.com/file/file/releases/.
  5. Build native gems.

I like the idea of building a native gem since it could save users compilation time, but it may be orthogonal to the other options.

I dislike the submodule option.

@kwilczynski What would you prefer right now?

kwilczynski commented 3 years ago

Hi @stanhu,

Thank you for laying out all the possible options!

I am with you on the (3). We could aim to solve (1) and (5), and I am pro (5) too. I know that Nokogiri gets a some of complaints for being distributed as native Ruby Gem, but for most of its users this is no longer an issue, I suppose.

The (2), (3) and (4) are somewhat similar - distribute libmagic as part our code base, correct?

Krzysztof

kwilczynski commented 3 years ago

Hi @stanhu,

I've been also thinking, that eventually, we'd need to support vendoring some of the libmagic dependencies like zlib, liblzma, etc., (see gitlab-org/ruby-magic/-/merge_requests/1#note_537310311) to make sure that we are completely not depending on what the underlying operating system provides.

In which case, most likely staying with (5) would be the best option.

Krzysztof

kwilczynski commented 3 years ago

Hi @YorickPeterse,

As per the conversation above, if we keep the extension to be build as native Ruby Gem, and provide a way to override both the download URL and SHA256 checksum at build time, would that be acceptable?

I am not sure how open @flavorjones would to be changing how the download works in mini_portile. However, I found this in the past https://github.com/flavorjones/mini_portile/issues/44 which might be somewhat related and closing this issue would possibly also solve our use case - although, it's not directly related to multiple download mirrors and retry, etc.

Krzysztof

yorickpeterse commented 3 years ago

@kwilczynski That's also fine. I do think vendoring more dependencies, such as zlib, might not be needed. These dependencies tend to be widely available and provide a stable API, so using the system version should be fine.

flavorjones commented 3 years ago

:wave: Nokogiri maintainer and mini_portile2 maintainer here.

I do wonder how Nokogiri manages to keep the number of downloads down, given that they vendor dependencies too.

Gem installation doesn't download libxml2 or libxslt -- we package the tarballs in the gem file.

Please let me know if/how I can help. I'm absolutely open to improving mini_portile and would be willing to pair on problem definition or solution.

kwilczynski commented 3 years ago

Hi @flavorjones, nice to have you here!

👋 Nokogiri maintainer and mini_portile2 maintainer here.

Thank you for creating and maintaining mini_portile2 and Nokogiri! Both are indispensable!

I do wonder how Nokogiri manages to keep the number of downloads down, given that they vendor dependencies too.

Gem installation doesn't download libxml2 or libxslt -- we package the tarballs in the gem file.

Ah! I didn't think about this approach! That is very clever and also allows for mini_portile2 to do its work without issues.

After you mentioned this, I had a look around and found this: rakelib/extensions.rake#L172-L181

Now it makes a lot more sense! I think, we could apply a similar approach.

Please let me know if/how I can help. I'm absolutely open to improving mini_portile and would be willing to pair on problem definition or solution.

Thank you so much! I also apologise, I didn't wanted to cause you more work to do or anything like that. We are going to try to help as much as possible. Allow me to describe what we were thinking recently.

We had some issues with mirror that host our dependency - libmagic. Some of the mirrors throttle, some of them don't have the same version just after a new release as it takes a while for a given mirror to get new tarballs, etc. A whole range of issues we've seen before. Thus, given that files attribute from the recipe takes a list of objects denoting where to download a given artefact from, the idea I had was as follows:

recipe.files = [
    {
        url: "file://#{File.dirname(__FILE__)}/file-#{recipe.version}.tar.gz",
        sha256: LIBIMAGE_SHA256,
    },
    {
        url: "ftp://ftp.astron.com/pub/file/file-#{recipe.version}.tar.gz",
        sha256: LIBIMAGE_SHA256,
    },
    {
        url: "https://ruby-magic.s3.eu-central-1.amazonaws.com/file-#{recipe.version}.tar.gz",
        sha256: LIBIMAGE_SHA256,
    },
]

But as @stanhu found out, every mirror from this list will be used to download the same file again, even if previous download was completed successfully and the checksum matched - it would be nice to only retry other mirrors in a case of a download failure (either network-related, bad response or mismatching checksum).

Would something like that make sense to implement?

However, in the view that you ship tarballs bundled together as part of the native Ruby Gem (which would definitely remove the immediate need to download anything), this might not be as useful any more.

What do you think @flavorjones?

Krzysztof

flavorjones commented 3 years ago

@kwilczynski Thanks for providing this context.

I'd be open to mini_portile supporting multiple mirrors; but honestly my advice to you is that you should avoid relying on mirrors at gem-installation time. They'll always be flaky and/or throttle your users, which will result in unhappy users and unactionable support requests.

I would recommend first try packaging your dependencies into the gem that you're shipping. This will make installation faster and more reliable for your users. The steps I'd recommend taking:

As a separate topic: packaging precompiled native gems is a little tricky, but you're already most of the way there. I'd be happy to help you put that together, if you like!

kwilczynski commented 3 years ago

Hi @flavorjones,

[...]

I'd be open to mini_portile supporting multiple mirrors; but honestly my advice to you is that you should avoid relying on mirrors at gem-installation time. They'll always be flaky and/or throttle your users, which will result in unhappy users and unactionable support requests.

This is a very good point, and the premise that @YorickPeterse held too, and I also agree. I want the users of this Ruby Gem to have the best experience possible.

I would recommend first try packaging your dependencies into the gem that you're shipping. This will make installation faster and more reliable for your users.

Definitely looks like the way to go. What is your release process for Nokogiri?

The steps I'd recommend taking:

  • make sure to verify checksums when you cook the recipe (looks like you're already doing this!)

We might move to the dependencies.yml similarly to what Nokogiri is using to track everything in a single place.

  • modify the gemspec files to include the tarball when you package the gem

Seems like puting the version we want to vendor in the ports/archives directory would make it possible for mini_portile2 to find the archive just fine - as per what I've see Nokogiri does.

As a separate topic: packaging precompiled native gems is a little tricky, but you're already most of the way there.

It definitely is involved. We have arrived at the current state thanks to a valiant effort from @stanhu, who first added vendored dependency proving that it works, and then make all the necessary work to switch over to using mini_portile2 following Nokogiri as an example of how to do it correctly (it has been a bit of a gold standard in that regards).

I'd be happy to help you put that together, if you like!

Most definitely! Only if you don't mind and have the time to do it. :) So, if you have any suggestions or critique, then by all means! We are still lacking a lot of things that Nokogiri already has such as e.g., proper dependency packaging, cross-platform support (Windows anyone?), better testing coverage and tests regarding elements that are specific to native Ruby Gems (you have a lot of additional excellent tools there - from debugging to memory leak testing), etc.

Krzysztof

kwilczynski commented 3 years ago

Hello everyone!

A small update. I think this conversation here has somewhat good timing, especially about how to distribute libmagic sensible.

We most definitely have to start bundling the tarball as part of the Ruby Gem, and perhaps start building native Ruby Gems sooner, as currently the project is using the S3 bucket I have set up some time ago, and the cost is already over $20 in transfer fees (the storage cost is insignificant). I suppose, the more people uses either GitLab or just installs the Ruby Gem, the higher the cost will be - all the CI jobs running... 😄

But sadly... I might not be able to afford this long term. This is not a complaint or otherwise - I just need to be mindful about the cost since I am currently unemployed (plus the exchange rate of USD to PLN does not help).

Nonetheless, this is nice, as it's nice to know someone I've done someone else found to be of use.

Krzysztof

flavorjones commented 3 years ago

@kwilczynski I've submitted a PR at #16 to include the tarball in the gemfile. Let me know what you think?

stanhu commented 3 years ago

Thanks @flavorjones! This problem should be gone now. I've released v0.4.0 that now includes the .tar.gz. Notice the size increase:

image