rake-compiler / rake-compiler

Provide a standard and simplified way to build and package Ruby C and Java extensions using Rake as glue.
MIT License
563 stars 124 forks source link

compile time gem dependencies #82

Open rubys opened 11 years ago

rubys commented 11 years ago

Background:

Nokogumbo provides the ability for a Ruby program to invoke the Gumbo HTML5 parser and to access the result as a Nokogiri::HTML::Document.

Nokogumbo makes use of a single Nokogiri API:

VALUE Nokogiri_wrap_xml_document(VALUE klass, xmlDocPtr doc);

Nokogumbo successfully builds and runs today on Ubuntu Linux and OSX Mountain Lion. I recently converted the Rakefile to use rake-compiler in anticipation of cross compiling to Windows. My next step is to update my extconf.rb as follows:

-Rake::ExtensionTask.new('nokogumboc')
+Rake::ExtensionTask.new('nokogumboc', SPEC) do |ext|
+  ext.cross_compile  = true
+  ext.cross_platform = ["x86-mswin32-60", "x86-mingw32"]
+end

The result, predictably, is:

/usr/local/lib/site_ruby/1.9.1/rubygems/dependency.rb:296:in `to_specs': Could not find 'nokogiri' (>= 0) among 6 total gem(s) (Gem::LoadError)

I'm also looking into rake-compiler-dev-box, and hitting all sorts of mundane issues (current example: package_win32_fat_binary.sh includes Ruby 1.8.7, but nokogiri no longer supports that release; running rake cross compile myself results in Gem rake is not installed; rake-compiler-dev-box clearly makes use of rvm, but doesn't set up the rvm command).

I'm working through all of these (eg: . .rvm/scripts/rvm), and will contribute back whatever I learn in the form of pull requests for code, documentation, whatever. Meanwhile, any advice I can get would be appreciated.

rubys commented 11 years ago

Current blocker (both on my Ubuntu host machine, and on rake-compiler-dev-box):

/vagrant/nokogumbo/tmp/x86-mswin32-60/nokogumboc/1.9.3/mkmf.rb:381:in `try_do': The compiler failed to generate an executable file. (RuntimeError)
You have to install development tools first.

Relevant portion of the mkmf.log:

collect2: ld returned 1 exit status
checked program was:
/* begin */
1: #include "ruby.h"
2:
3: #include <winsock2.h>
4: #include <windows.h>
5: int main() {return 0;}
/* end */

Reproduction instructions for rake-compiler-dev-box. First remove 1.8.7 and 2.0.0 from package_win32_fat_binary.sh and prepare_xrubies.sh (in the process, change rvm use 1.8.7 to rvm use 1.9.3)

Then, from a fresh 'vagrant up':

sudo apt-get install libxslt-dev libxml2-dev
cd /vagrant
git clone https://github.com/rubys/nokogumbo.git
./package_win32_fat_binary.sh nokogumbo
luislavena commented 11 years ago

Hello Sam,

First apologies for the lack of response, but I normally work on Open Source projects during the weekends.

To better help you out, I'm going to over a few things I noticed from your project, from requirements to execution, that will help us determine what is going on and what is failing, ok?

First thing first, dependencies.

As you mention before, it is clear your project depends on Nokogiri, but not Nokogiri as interface but it's internals (nokogiri.h).

Making that dependency to Nokogiri internals also pushes on you a dependency on libxml2, as stated in your extconf.rb:

https://github.com/rubys/nokogumbo/blob/master/extconf.rb#L5

Next, you lookup for Nokogiri installation to determine it's source code location. This points out that nokogiri is required to be installed in order to compile the extension, and not just installation.

Please keep this in mind as I move forward the other points.

The last part of the dependencies is the availability of gumbo parser source code, which is not clearly outlined in the extconf.rb, neither as a git submodule to ease development but only during a packaging phase, where things are copied around (more about that in the next point).

Gem structure

I couldn't miss noticing that your project is all the files in the root folder, not having a traditional/standarized structure for gems, as described in the RuyGems Guides here and here

I did notice that you copy files around in your Rakefile to be able to generate the gem structure before packaging.

I wonder if wouldn't be better have that gem structure from the beginning and avoid copy files around?

One thing that I find important is consistency and conventions. One thing you will notice of rake-compiler project is that almost all the projects that use it follow the same conventions and structure, which reduced a lot of code require and tweaks, not to mention the mental learning curve around the codebase.

If you look at rake-compiler structure recommendation will see what I mean.

This will greatly help both your project organization and welcoming other developers

Build chain and compiling dependencies

As mentioned before, having Nokogiri installation as gem to be able to retrieve source/headers from it might be a particular scenario: first, it requires it installed to be able to compile during development phase. Second it will be required to be installed during compilation.

This means that compiling natively will require nokogiri be installed but also means that to cross-compile nokogumbo, it will require the cross-compiled version also be installed.

Cross-compiled Ruby cannot be used to install or compile gems, simply because the interpreter do not run natively but instead faked using the native interpreter.

Because of this, you cannot install a gem inside the cross-compiled Ruby that will be used to compile your extension.

This could be avoided if instead of looking up for Nokogiri source code using RubyGems dependencies, its source code has been added to your own project, removing the need to look for it at compilation time.

The second limitation is libxml2 and libxslt, which are two dependencies of Nokogiri that are compiled and cross-compiled when targeting other platforms like Windows.

In your case, you're not dealing at all with such dependency, which will cause failure as libxml2 headers and support libraries will be missing to the cross compiler.

If you look at Nokogiri, they deal with libxml dependency in their extconf:

https://github.com/sparklemotion/nokogiri/blob/master/ext/nokogiri/extconf.rb

Last on this topic, gumbo-parser should be compiled as another dependency, in this case, cross-compiled too, and be part of the dependency chain required for the extension to compile successfully.

Copying gumbo source files, as covered in your extconf.rb might not be produce the best consistent results across different systems.

Either it is a submodule that is checked out and the source code part of the gem, or is compiled separately and then used during linking.

Cross compiling to Windows

One of rake-compiler abilities is allow developers to cross-compile gems to Windows.

While that might sound like a miracle, it just a sequence of conventions and existing tools orchestrated for you.

To make things a bit more easier, a VM has been provided, which covers most of the use cases, like targeting what we call fat-binaries and include binaries for Ruby 1.8.7 to 2.0.0.

In relation to the platforms questions, RubyInstaller (the official installers) uses x86-mingw32 (i386-mingw32) as platform and x64-mingw32 just for Ruby 2.0.0 (the 64bits version).

The usage of i386-mswin32-60 is used for legacy purposes as it supports old Ruby 1.8.6 (One-Click Installer). Generating that gem is not necessary anymore.

In your particular case, these automated scripts might not serve you as 1.8.x is no longer supported. That can be solved by simply editing the provided scripts to your needs.

As said before, the VM follows the standards and the existing conventions, but each project might have its differences. Also the VM is just for compilation, it expects code will be possible to be compiled having Bundler and default commands and not manual interaction.

But for all these tools to works properly, all the previous points needs to be covered.

Present to developers and the tools a standard/conventional gem structure, so rake-compiler can work without complicated hidden dependencies.

Deal with the external dependencies either as modules of your project, pre-compiled or compiled as requirement of your compilation task. On this subject, you can take a look at how things are being done by projects like sqlite3-ruby vendor_sqlite3.rake, rugged or Nokogiri itself (as linked before)

I would suggest focusing on be able to generate a native gem (not even cross-compiled version) outside your own development machine (inside the VM for example using the native script). That will give you an idea on what needs improvement to make the entire process idempotent.

Of course, all this is my personal opinion on how to deal with these dependencies and the code in a more maintainable way.

Hope it helps.

rubys commented 11 years ago

First apologies for the lack of response, but I normally work on Open Source projects during the weekends.

Not a problem. Thanks for taking the time to respond.

Making that dependency to Nokogiri internals also pushes on you a dependency on libxml2

Actually, my code depends heavily on libxml2 and gumbo (calling a number of public interfaces on each), and when done makes a single call to nokogiri.

The last part of the dependencies is the availability of gumbo parser source code,

Gumbo is new and probably won't be installed on most people's machines. However, it is straight C code, and extconf builds a Makefile that will compile it just fine, so my Rakefile will do a git clone and copy the necessary files into the ext/nokogumboc directory if it isn't already present.

Note that this will result in a single .so (or .dll) file which embeds the parser.

I wonder if wouldn't be better have that gem structure from the beginning and avoid copy files around?

Since I'm (optionally) copying files into the gem structure, that makes rake targets like clean and clobber more complicated if those directories contain both nokogumbo source files and other files.

Cross-compiled Ruby cannot be used to install or compile gems

This does not makes sense to me as installing a gem is merely a matter of putting the right files into the right places.

In your case, you're not dealing at all with such dependency, which will cause failure as libxml2 headers and support libraries will be missing to the cross compiler.

I seem to be finding the libxml2 headers (I develop on Ubuntu, and most of the headaches that nokogiri's build process appears to be working around is due to Mac OSX issues), that being said, you probably are right when it comes to link time.

Present to developers and the tools a standard/conventional gem structure, so rake-compiler can work without complicated hidden dependencies.

This seems to be a sticking point for you, so lets start with that. Lets put aside libxml2 and nokogiri for a moment, and consider gumbo_parser. Everything I need is in the src directory. All I need to add is an extconf.rb, and in that file specify $CFLAGS = " -std=c99".

Given that as the requirements, can I ask how you would recommend I structure my repository so that the gumbo parser will not only be compiled but also installed?

luislavena commented 11 years ago

Actually, my code depends heavily on libxml2 and gumbo (calling a number of public interfaces on each), and when done makes a single call to nokogiri.

But remains true what I said: you depend on nokogiri internals.

Also, you're ignoring the point I mention about libxml. While it might be present in some Linux installation, is not available for cross compilation.

Gumbo is new and probably won't be installed on most people's machines. However, it is straight C code

Again goes to the same point I mentioned: compilation of the library. By just including the source code your skipping the dependencies that gumbo might require for the cross compilation.

I gave you some pointers on some projects that compile dependencies, rugged and sqlite3, have you looked at those links?

Since I'm (optionally) copying files into the gem structure, that makes rake targets like clean and clobber more complicated if those directories contain both nokogumbo source files and other files.

I tried to get your work running on my machine prior making this comment. It is clear to me that while this approaches works for you in your environment, is not clear what needs to happen for someone to contribute.

On other projects that I got request to be involved, I freely and happily send my pull requests with improvements and fixes, you can confirm this by looking at my contributions.

However in this case, the way things are build aren't solid enough for me to go ahead and send you my improvements.

This does not makes sense to me as installing a gem is merely a matter of putting the right files into the right places.

But that covers just the gem, not the dependencies. Can you point me where in your Rakefile or extconf are you cross compiling, obtaining packages or linking against libxml?

Again, please see the work done in nokogiri to cross compile libxml, which you depend on too.

The cross compiled ruby cannot be executed natively, that is why the gem that you depend on (nokogiri) is required to be installed.

You installing Ubuntu libxml dependencies only solves the native compilation, not the cross compilation.

For cross compilation to work, ruby and the dependencies of your library needs to be compiled and available in the same format of the target platform (in this case, nokogiri compiled for windows plus libxml and gumbo).

English is not my native language, so perhaps what I'm explaining do not translate properly. I will thank if you can take a look to check what nokogiri is doing for cross compilation and also things like rugged and sqlite3, the former building locally a dependency library while the later depends on a external package.

This seems to be a sticking point for you, so lets start with that. Lets put aside libxml2 and nokogiri for a moment, and consider gumbo_parser. Everything I need is in the src directory. All I need to add is an extconf.rb, and in that file specify $CFLAGS = " -std=c99".

Please read my comments above, including the source might not serve to properly detect the library dependencies for different platforms. Take for example rugged. Libgit2 source is available, however is not simply included as it needs to determine the platform available features.

All my previous comments were made on the base that I was not able to get a basic thing working with your code. Those are recommendations and you can take them or ignore them.

Given that as the requirements, can I ask how you would recommend I structure my repository so that the gumbo parser will not only be compiled but also installed?

I believe in my previous comment included links to rubygems guides, rake-compiler readme structure recommendations and examples, now is your turn to extrapolate those to your project.

Something that I learned over the years is that instead of doing the changes myself, it has more value to project maintainers understand the reasoning behind these modifications. rake-compiler was born because I got tired of doing this on every project.

Cross compilation is neither simple nor have a unique solution, I'm sharing with you my knowledge and experience dealing with ruby and Windows, doing this for several projects and worked for me, but I might be wrong, will love and appreciate better solutions to improve it.

Regards.

Sorry for top posting. Sent from mobile. On Aug 31, 2013 2:10 PM, "Sam Ruby" notifications@github.com wrote:

First apologies for the lack of response, but I normally work on Open Source projects during the weekends.

Not a problem. Thanks for taking the time to respond.

Making that dependency to Nokogiri internals also pushes on you a dependency on libxml2

Actually, my code depends heavily on libxml2 and gumbo (calling a number of public interfaces on each), and when done makes a single call to nokogiri.

The last part of the dependencies is the availability of gumbo parser source code,

Gumbo is new and probably won't be installed on most people's machines. However, it is straight C code, and extconf builds a Makefile that will compile it just fine, so my Rakefile will do a git clone and copy the necessary files into the ext/nokogumboc directory if it isn't already present.

Note that this will result in a single .so (or .dll) file which embeds the parser.

I wonder if wouldn't be better have that gem structure from the beginning and avoid copy files around?

Since I'm (optionally) copying files into the gem structure, that makes rake targets like clean and clobber more complicated if those directories contain both nokogumbo source files and other files.

Cross-compiled Ruby cannot be used to install or compile gems

This does not makes sense to me as installing a gem is merely a matter of putting the right files into the right places.

In your case, you're not dealing at all with such dependency, which will cause failure as libxml2 headers and support libraries will be missing to the cross compiler.

I seem to be finding the libxml2 headers (I develop on Ubuntu, and most of the headaches that nokogiri's build process appears to be working around is due to Mac OSX issues), that being said, you probably are right when it comes to link time.

Present to developers and the tools a standard/conventional gem structure, so rake-compiler can work without complicated hidden dependencies.

This seems to be a sticking point for you, so lets start with that. Lets put aside libxml2 and nokogiri for a moment, and consider gumbo_parser. Everything I need is in the srchttps://github.com/google/gumbo-parser/tree/master/srcdirectory. All I need to add is an extconf.rb, and in that file specify $CFLAGS = " -std=c99".

Given that as the requirements, can I ask how you would recommend I structure my repository so that the gumbo parser will not only be compiled but also installed?

— Reply to this email directly or view it on GitHubhttps://github.com/luislavena/rake-compiler/issues/82#issuecomment-23610206 .

rubys commented 11 years ago

By just including the source code your skipping the dependencies that gumbo might require for the cross compilation.

Gumbo is a pure C99 library with no outside dependencies.

I gave you some pointers on some projects that compile dependencies, rugged and sqlite3, have you looked at those links?

Those appear to presume that you have installed the necessary dependencies separately, and point to where they are installed. Perhaps I am attempting to be too clever, but as gumbo is a pure C library, I'm instead cloning the repository and copying what I need into the ext/nokogumboc directory prior to running extconf.rb.

Can you point me where in your Rakefile or extconf are you cross compiling, obtaining packages or linking against libxml?

Again, I may trying to be too clever, but I let nokogiri do that for me:

  require "#{nokogiri_ext}/extconf.rb"

English is not my native language

You are doing well! And my high school Spanish, while rusty, was good enough to follow this presentation: http://blog.mmediasys.com/2011/11/26/rubyconf-argentina-and-fenix/ :-)

I believe in my previous comment included links to rubygems guides, rake-compiler readme structure recommendations and examples, now is your turn to extrapolate those to your project.

I'm still not getting it :-( That's why I hoped that we could start with something simpler at first. Perhaps with an an earlier revision of nokogumbo?

Note that this /nearly/ follows the recommended structure (I didn't know then to insert 'nokogumboc' into the ext directory structure). It consists of a single C file that only has compile time dependencies against Ruby and Gumbo -- and again the latter is a pure C99 library with no other dependencies.

Something that I learned over the years is that instead of doing the changes myself, it has more value to project maintainers understand the reasoning behind these modifications

I can appreciate that. But hopefully we can find a happy medium between you doing all of the work and me trying clumsily to follow the examples I find and for you to tell me to once again look at those same examples.

Taken as a whole, gumbo + nokogumbo is 12 C99 source files and 14 header files that depend only on Ruby. What I will try to do (probably in a separate branch) is to get just that working before I attempt to tackle pulling in nokogiri and libxml2.

luislavena commented 11 years ago

Gumbo is a pure C99 library with no outside dependencies.

You're correct, I was actually talking any possible special library that Gumbo might require when setting up on Windows, but seems Gumbo doesn't use anything in particular (by looking at autoconf and friends), so just including the code will be ok.

I am attempting to be too clever, but as gumbo is a pure C library, I'm instead cloning the repository and copying what I need into the ext/nokogumboc directory prior to running extconf.rb

Perhaps gumbo repository can be used as submodule and then added to the list of objects instead of copied over?

I normally try to avoid do that and instead produce a static library of the dependency (gumbo in this case) and link against it. That is what Rugged does with libgit2.

I would say that taking a step back and going with a simpler (non-nokogiri) approach first might be better than using Nokogiri internals.

I need to push some commits for tiny_tds project first and then will take a look to your early structure approach.

Will send my comments later today.

Thank you.

rubys commented 11 years ago

Perhaps gumbo repository can be used as submodule and then added to the list of objects instead of copied over?

I've added it as a submodule. I can't seem to find documentation for mkmf that covers all of the things you can do with global variables, but from what I can tell, mkmf assumes that everything is in one directory. (You can specify which directory you want to use, but you can't specify multiple). I could be wrong about this, as this is based on reviewing the source code.

I normally try to avoid do that and instead produce a static library of the dependency (gumbo in this case) and link against it. That is what Rugged does with libgit2.

libgit2 provides Makefile.embed. gumbo-parser builds this using ./configure. Not being sure whether or not ./configure would play nice with cross compiling, at the moment I'm sticking with extconf and mkmf.

I would say that taking a step back and going with a simpler (non-nokogiri) approach first might be better than using Nokogiri internals.

Such an approach would come with a significant CPU and memory usage penalty, but I've added conditional compilation instructions falling back to such an approach should nokogiri headers not be found.

Current status

At this point in time, nokogumbo is effectively a "pure C" library with no required dependencies other than Ruby itself. As such, I would think that cross compiling should be easy peasy. Unfortunately, I'm still seeing this error. This is both on my machine and on rake-compiler-dev-box, The error indicates that winsock can't be linked in to the library -- something I don't explicitly require.

Building and testing on Ubuntu or Mac OSX is as simple as installing dependencies (via bundle install) and rake.

Running rake cross compile will reproduce the problem I've described. You can verify that the standard ext and lib directories are built prior to the cross compilation process.

rubys commented 11 years ago

Success?

Apparently, I was misunderstanding the error message produced... the problem was libxml2 not being found. But as libxml2 is not a hard requirement any more, I rearranged my extconf.rb file and am able to build nokogumboc.so. Despite the name ending in .so, I verified using strings that msvcrt.dll was in the file, as well as names of various Windows APIs such as GetProcAddress and EnterCriticalSection.

Next up: figuring out how to build a Gem for Windows.

rubys commented 11 years ago

Not quite successful yet:

invalid ELF header - /home/rubys/git/nokogumbo/lib/nokogumboc.so (LoadError)

Also, it doesn't appear that mingw supports C99. :-(

So, current status is that now I have implemented the recommended directory structure, reference gumbo-parser as a submodule, have no required dependencies beyond Ruby, and while this appears to be sufficient to install the gem on Windows 8 with RailsInstaller, I still can't cross compile.

luislavena commented 10 years ago

Hello @rubys, sorry for the late response, but I was down the rabbithole for the past months at work.

Shared objects .so generated for Windows are not executables in Linux, they are not even ELF but instead PE or PE+.

Perhaps you tried to run nokogumbo after the cross compilation? If that was the case, you need to perform a native compile so the .so gets replaced by your local platform one.

rubys commented 10 years ago

Any chance you can try? As I stated above, the current state is that nokogumbo can now be compiled with no required dependencies beyond Ruby, and I have implemented the recommended directory structure, yet I can not get it to work. As such, it should be the perfect candidate project for rake-compiler; but I've tried everything and failed.

luislavena commented 10 years ago

@rubys will try again this weekend while I work fixing some issues with rake-compiler and rake-compiler-dev-box.