jellybob / mimemagic

Mime type detection in ruby via file extension or file content
https://github.com/minad/mimemagic
MIT License
10 stars 6 forks source link

Support loading data at runtime, or by configuring a different location for a preinstalled version #1

Closed jellybob closed 3 years ago

jellybob commented 3 years ago

No license lawyering please - unless you're someone able to speak on behalf of freedesktop.org your interpretation of the GPL isn't going to add anything to this conversation

See https://github.com/rails/rails/issues/41750 and https://github.com/minad/mimemagic/issues/97 for background.

In order to cause minimal impact on existing users of the mimemagic gem, particularly people using Rails, I'm going to have it load MIME types from a preinstalled version of the Freedesktop MIME types database, rather than bundling one with the gem. This will require having a copy of that either installed with your distribution, or obtained in some other way. The availability of that will be checked at build time.

minad commented 3 years ago

Maybe you want to synchronize with @coding-bunny, who attempts to replace mimemagic with another gem. This could also be a viable approach?

jellybob commented 3 years ago

I honestly see the alternative of using a different gem as a non-starter, since that gem only supports matching on file extension.

Scharrels commented 3 years ago

There seems to be a parallel effort over here: https://github.com/Deradon/mimemagic/tree/fetch-mine-data-dynamically

It might be useful to combine these efforts.

stevenhaddox commented 3 years ago

The license file here seems to be GPLv2 which is a large part of the problem many have with the newest release of the original gem from my understanding.

Is there any intent to find an older version to start this fork from that has a prior license option or will this fork require GPLv2?

EDIT: This comment was made before I realized the yank of the older versions occurred due to files that were being used in the older version in a way that likely violated the license those files were released under. Please disregard.

minad commented 3 years ago

From my side it is okay to take the newest commit and revert back to MIT. But the tables.rb and freedesktop.org.xml must not be distributed as part of the gem.

jellybob commented 3 years ago

Digging into this further it seems that the XML file isn't the source of truth for this gem, but instead a Ruby class generated from that file, so there are in fact a few steps:

  1. Locate an XML source file, either pulled remotely or via environment variable.
  2. Build lib/mimemagic/tables.rb somewhere. (Possibly it would work to pipe it all through eval, but that makes me uncomfortable)
  3. Load lib/mimemagic/tables.rb

The parallel effort going on appears to just pull down the XML file at build time, rather than having it in source control, but the final gem will still include that file and therefore remain GPL licensed.

jellybob commented 3 years ago

@minad I'm seeing 4 test failures on master currently - are they expected?

$ rake test
Run options: --seed 26532

# Running:

.F...FF....F.

Finished in 0.173703s, 74.8404 runs/s, 299.3616 assertions/s.

  1) Failure:
TestMimeMagic#test_recognize_by_magic [/Users/jonwood/Code/mimemagic/test/mimemagic_test.rb:83]:
--- expected
+++ actual
@@ -1 +1,2 @@
-"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+# encoding: ASCII-8BIT
+"application/zip"

  2) Failure:
TestMimeMagic#test_recognize_all_by_magic [/Users/jonwood/Code/mimemagic/test/mimemagic_test.rb:92]:
--- expected
+++ actual
@@ -1 +1 @@
-["application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "application/zip"]
+["application/zip"]

  3) Failure:
TestMimeMagic#test_recognize_extensions [/Users/jonwood/Code/mimemagic/test/mimemagic_test.rb:55]:
--- expected
+++ actual
@@ -1 +1,2 @@
-"text/html"
+# encoding: ASCII-8BIT
+"application/xhtml+xml"

  4) Failure:
TestMimeMagic#test_recognize_by_a_path [/Users/jonwood/Code/mimemagic/test/mimemagic_test.rb:64]:
--- expected
+++ actual
@@ -1 +1,2 @@
-"text/html"
+# encoding: ASCII-8BIT
+"application/xhtml+xml"

13 runs, 52 assertions, 4 failures, 0 errors, 0 skips
rake aborted!
Command failed with status (1)
/Library/Ruby/Gems/2.6.0/gems/rake-12.3.2/exe/rake:27:in `<top (required)>'
Tasks: TOP => test
(See full trace by running task with --trace)
khalilovcmd commented 3 years ago

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

I can at least comment on this having a history of monkey patching and instability.

https://github.com/minad/mimemagic/issues/39 https://github.com/minad/mimemagic/issues/86

minad commented 3 years ago

@jellybob You may also consider distributing a GPL-licensed package including freedesktop.xml+tables.rb+LICENSE. Then the gem could pull that at runtime, offloading the generation process somewhere else. These failures are probably due to using a newer database version and encoding changes in ruby.

tenderlove commented 3 years ago

I don't want to have too many cooks in the kitchen, but couldn't we add an extconf.rb to the gem? It could download the xml file and generate the rb file when the gem is installed on target systems. No GPL code or files would be distributed with the gem.

jellybob commented 3 years ago

I don't think that really helps anything, as you're still pulling in a GPL dependency. I'm honestly a little bit dubious that pulling in the Freedesktop XML at runtime does much for strict license compliance as well - I'm attempting to get in touch with the maintainer of that file to ensure this approach does in fact result in a compliant gem.

jellybob commented 3 years ago

Just to revise that statement, I'm 90% confident that using a pre-existing install of the file is going to be safe, as there's no distribution or attempts to do an end run round GPL licensing involved in that, so I'll push on with that path. I'm less confident that including code that goes and downloads wouldn't be considered as against the spirit of the license.

Deradon commented 3 years ago

WDYT about this:

We could use pre_install_hooks when doing gem install. Still, feels quite hacky tbh.

So the gem would not share any derived copies of GPL licenced work.

coding-bunny commented 3 years ago

How would that work on platforms using this gem that don't have the required XML files installed? Let's say a Windows / Macos System?

Deradon commented 3 years ago

I just thought about fetch the xml during gem install from https://gitlab.freedesktop.org/xdg/shared-mime-info/-/blob/2.1/data/freedesktop.org.xml.in.

UPDATE: More or less doing a rake tables during gem install based on my quick'n dirty PoC. (ofc rake tables would not work. Have to achieve this somehow different)

jellybob commented 3 years ago

Given the generation doesn't actually take very long I'm inclined to do this at runtime - the alternative requires a bunch of hackery in a pre install hook, and potentially confusing error messages during the install. People who are concerned about the amount of time it might take to download the source file at runtime can make sure the machine they're running on has the file available before hand.

Deradon commented 3 years ago

At runtime would be a showblocker for any1 where the rails application is running w/o access to the public internet.

(e.g. build docker image, which fetches data from public internet, then deploy it internally, where you don't have access)

jellybob commented 3 years ago

That's why there'll be the option to load a pre-existing version of the file via an environment variable.

jesseclark commented 3 years ago

WDYT about this:

* Build the `tables.rb` during runtime while installing the gem?

We could use pre_install_hooks when doing gem install. Still, feels quite hacky tbh.

So the gem would not share any derived copies of GPL licenced work.

Seems like this approach still could trigger the GPL requirement. Here is a freedesktop contributor indicating that they believe the the GPL applies to the db/xml itself.

jellybob commented 3 years ago

Unless you're able to speak on behalf of freedesktop.org discussions of what the GPL does or does not require are just adding noise here. Please keep discussion to the actual implementation of the plan described.

jesseclark commented 3 years ago

Unless you're able to speak on behalf of freedesktop.org discussions of what the GPL does or does not require are just adding noise here. Please keep discussion to the actual implementation of the plan described.

I linked to a thread where contributors to freedesktop.org are discussing exactly the issues that are also being discussed in this thread. It seemed like relevant information to take into consideration for the "plan described". Just trying to be helpful but I'll not add anymore "noise" 👍🏼 .

jellybob commented 3 years ago

Apologies for the slightly rough tone there. Just to clarify I'm talking to the maintainers at the moment about what would be needed (if its at all possible) to be compliant with both the legal terms of the license, and more generally the spirit of that license.

hadess commented 3 years ago

Unless you're able to speak on behalf of freedesktop.org discussions of what the GPL does or does not require are just adding noise here. Please keep discussion to the actual implementation of the plan described.

There isn't anyone that can do that. Individual contributors hold their own copyright. I'm just but one of a number of contributors to the module where the freedesktop.org.xml originates from.

1. Support for pulling down the freedesktop.org definitions at application startup. This will be the default behaviour.

2. Support for setting the environment variable `FREEDESKTOP_MIME_TYPES_PATH`, which will disable downloading at runtime, and instead load the data from a different location.

Both look good options to me, as long as the files are used as data files, and not use to create code from. Otherwise you should use the update-mime-database file to process the XML, to create cache files (those caches are part of the shared mime info specification).

jellybob commented 3 years ago

@hadess thanks for the reply. I believe the proposed solution falls within the usage you've described, so in lieu of anyone being able to definitively speak on this I'm going to go ahead and finish implementing this.

jellybob commented 3 years ago

For anyone following along at home, I think we can also skip over the whole thing of generating a Ruby class, and instead just parse the mime type database directly into constants. The generation of a source file is just an optimisation which doesn't really buy us anything if we're doing it all at runtime anyway.

fooishbar commented 3 years ago

Thanks for the pointer @jellybob. I'm one of the people who runs freedesktop.org, but @hadess is correct. fd.o doesn't hold copyright assignments - the copyright belongs to whoever authored the code - so any decisions on copyright and licensing, including enforcement, are at the total discretion of the authors. In this case, @hadess holds substantial copyright on shared-mime-info so was entitled to take the action, and even if I wanted to I can't tell him otherwise.

I agree with @hadess that your interpretation of the license is correct - with the caveat that I am not a lawyer. Transforming the GPLed XML definitions into Ruby certainly retains the GPL obligations and implications on that Ruby. Loading and parsing the data at runtime changes that situation in two ways:

Thanks so much for your co-operation and helpful attitude. We really appreciate it, especially given the additional burden it introduces on your users.

wwahammy commented 3 years ago

I only have one request: please make sure this doesn't break things where it's possible for those of us who can use GPL2+ code to be able to include the data in the Gem. I'd like to still use the original mimemagic because it's simpler. :)

jsteinberg commented 3 years ago

I don't want to have too many cooks in the kitchen, but couldn't we add an extconf.rb to the gem? It could download the xml file and generate the rb file when the gem is installed on target systems. No GPL code or files would be distributed with the gem.

Could we still use extconf.rb to download the xml just not generate a rb file?

Downloading at runtime seems inefficient and introduces an external dependency to application boot.

regismesquita commented 3 years ago

That would be an odd territory, If we redistribute the built code it would contain the XML thus we would need to do it under GPLv2?

jellybob commented 3 years ago

Using extconf.rb you have the same problems as with downloading at runtime in regulated environments, but with a much less clear interface to deal with that. I you don't want to download at runtime then you can update your build process to pull the XML file from your distributions repos, or from the source, at build time.

jellybob commented 3 years ago

@wwahammy that's not possible, as it would require redistributing either the source file, or code derived from that source file, which is what resulted in this whole situation in the first place.

sin-ack commented 3 years ago

What about a "fallback" that uses the Red Hat public domain MIME database? So if you can either a) provide the file via FREEDESKTOP_MIME_TYPES_PATH b) have access to public Internet during runtime, the MIME type database would be pulled, but if neither of those work, MIME type detection could still work, albeit with not as good detection. You could then (somehow) notify developers that they have to provide the mimetypes file and that a fallback is being used.

hadess commented 3 years ago

What about a "fallback" that uses the Red Hat public domain version?

I'd very curious to know what that "Red Hat public domain version" is.

sin-ack commented 3 years ago

What about a "fallback" that uses the Red Hat public domain version?

I'd very curious to know what that "Red Hat public domain version" is.

This link was posted in the Rails issue: https://pagure.io/mailcap/blob/master/f/mime.types

I am taking the poster's word for it, and apparently https://github.com/elixir-plug/mime uses it.

Edit: To be clear, this is not a "public domain" version of the mimeinfo database that was used by mimemagic, it's a much simpler database that takes data from IANA and a few other places.

hadess commented 3 years ago

This link was posted in the Rails issue: https://pagure.io/mailcap/blob/master/f/mime.types

Right, it's not a "version" so much as something completely different and unrelated to shared-mime-info.

sin-ack commented 3 years ago

This link was posted in the Rails issue: https://pagure.io/mailcap/blob/master/f/mime.types

Right, it's not a "version" so much as something completely different and unrelated to shared-mime-info.

Yes, I believe I mis-worded it. It's a separate database.

jellybob commented 3 years ago

There's a PR at https://github.com/jellybob/mimemagic/pull/3 which does the bulk of the changes needed for this, the only thing left is pulling down a version from the internet if one isn't found locally. The more I think about it, the more dubious I become that doing so is actually the right behaviour, at the very least I'm not sure it should be default behaviour on requiring the gem.

If anyone is able to review the PR (particularly @minad who actually knows and understands the code) that would be great. I'm going to go take a break for a couple of hours and then come back to this, hopefully having thought on pulling from the internet a bit.

wwahammy commented 3 years ago

@wwahammy that's not possible, as it would require redistributing either the source file, or code derived from that source file, which is what resulted in this whole situation in the first place.

I can redistribute the source file though under my license. I'm asking that any changes not break those who can use the older version of the gem.

jellybob commented 3 years ago

@wwahammy that's up to Rails, which is what's bringing in mimemagic as a dependency, and could (potentially) allow configuring which version to depend upon. This change is on the mimemagic gem itself, and just does what's needed to bring it away from violating the GPL.

minad commented 3 years ago

Thank you for your work on this @jellybob! Also thank you @hadess and @fooishbar for working on shared-mime-info/freedesktop.org in the first place, and thank you for helping getting this sorted out!

@jellybob I will take a look at your changes.

@wwahammy Would be acceptable for you to release a "simple" mimemagic-gpl package released under GPL-2.0 which avoids the runtime stuff? I've seen you stated before that you would like to see a GPL version?

jellybob commented 3 years ago

@minad can you unarchive the upstream repository? I'm PRing into my own repo at the moment because I can't open PRs upstream. If you'd like to step away from maintaining the Gem I can totally get that, and I'd be happy to discuss helping out, but I'm not hugely interested in becoming the de-facto sole maintainer of a Rails dependency just because I happened to be the person who forked the repo!

ayufan commented 3 years ago

@tenderlove

I don't want to have too many cooks in the kitchen, but couldn't we add an extconf.rb to the gem? It could download the xml file and generate the rb file when the gem is installed on target systems. No GPL code or files would be distributed with the gem.

I don't think that this can be done this way. Since, if you would compile and distribute a compiled version of the gem it would have to be GPLv2 licensed. The only way forward seems to be of @jellybob, but this definitely has some downsides.

I do like the idea of mimemagic-gpl.

wwahammy commented 3 years ago

I don't think that this can be done this way. Since, if you would compile and distribute a compiled version of the gem it would have to be GPLv2 licensed. The only way forward seems to be of @jellybob, but this definitely has some downsides.

Based on my experience with including GPL code into non-GPL projects, I have had FOSS lawyers say this is perfectly acceptable and you can distribute the the result. It just depends how it's done

Everyone: there has been literally tens of thousands of dollars of developer time spent here and on the Rails and yet, to my knowledge, NO ONE has spent a few hundred bucks on a FOSS lawyer.

jellybob commented 3 years ago

For my part, spending money on lawyers wouldn't really help anything. I'm working at a company with an explicit policy that GPL licensed dependency aren't permitted, this library would be GPL licensed without the changes needed, so I'm making the changes.

minad commented 3 years ago

@jellybob

can you unarchive the upstream repository? I'm PRing into my own repo at the moment because I can't open PRs upstream. If you'd like to step away from maintaining the Gem I can totally get that, and I'd be happy to discuss helping out, but I'm not hugely interested in becoming the de-facto sole maintainer of a Rails dependency just because I happened to be the person who forked the repo!

I will not maintain this repository and library any longer. My proposal would be to move the library under the umbrella of some organization, either the Rails organization or a new mimemagic.rb organization maintaining this package.

ljharb commented 3 years ago

A lawyer who will publicly provide legal advice is typically one of the most expensive kinds. Feel free to investigate; I’m sure it wouldn’t be hard to pool together a few hundred dollars if such a unicorn presented itself.

minad commented 3 years ago

@georgeclaghorn You stated before that you are discussing the issue within the Rails team. Which way forward do you suggest?

wwahammy commented 3 years ago

A lawyer who will publicly provide legal advice is typically one of the most expensive kinds. Feel free to investigate; I’m sure it wouldn’t be hard to pool together a few hundred dollars if such a unicorn presented itself.

I am almost positive I can get an opinion though Conservancy since this affects my open source project Houdini but unless there's some sort of request official request from the Rails core team, I'm not wasting their expensive time. There's enough lawyers in enough big companies here to cover this issue.

ljharb commented 3 years ago

None of them can likely provide such advice without risking legal liability and thus violating their fiduciary duty to their employer.

wwahammy commented 3 years ago

@wwahammy Would be acceptable for you to release a "simple" mimemagic-gpl package released under GPL-2.0 which avoids the runtime stuff? I've seen you stated before that you would like to see a GPL version?

I'm only worried about not making this brittle. I want to be able to install my bundle and never have to worry about whether some external library is available. My project doesn't have a licensing problem, we don't want a unnecessary burden.