posterior / loom

A streaming cross-cat inference engine
BSD 3-Clause "New" or "Revised" License
19 stars 8 forks source link

Does Loom build on Ubuntu 16.04? #6

Closed fsaad closed 6 years ago

fsaad commented 7 years ago

Hi @fritzo,

We have recently finished writing an integration layer between loom and bayeslite, which we are quite excited to start exploring on a variety of real-world datasets. Our goal is to get to a stage where Loom is available as one of the default inference backends for bayeslite.

We have currently tested the integration on Ubuntu 14.04, and as far as we can tell the basic functionalities are all working. We have run python-based tests (via the bayeslite test suite), as well as end-user workflows that use Loom (via bayeslite) to replicate previous analyses from cgpm on (i) a satellites data set, and (ii) macroeconomic variables from the Gapminder Foundation. The results from the Loom backend are quite validating and encouraging.

The final stage of our integration is to build Loom on Ubuntu 16.04, which is the target platform for bayeslite/cgpm and other software on the probcomp stack, and I am wondering what it would take to complete this task. Successfully completing this task would allow us to install Loom as part of the probcomp stack, which is a dockerfile that bundles several software packages together.

I am at a stage where both distributions and loom do build on Ubuntu 16.04, by following the installation instructions from the .travis.yml.

However, while the software technically builds to completion, there are some runtime errors that arise from protobuf and the auto-generated file schema_pb2.py. Namely, importing loom.schema_pb2 fails:

probcomp-3:/scratch/fsaad/loom% ipython
In [1]: import loom.schema_pb2
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-2531daf40fda> in <module>()
----> 1 import loom.schema_pb2

/scratch/fsaad/loom/loom/schema_pb2.py in <module>()
      7 from google.protobuf import message as _message
      8 from google.protobuf import reflection as _reflection
----> 9 from google.protobuf import symbol_database as _symbol_database
     10 from google.protobuf import descriptor_pb2
     11 # @@protoc_insertion_point(imports)

ImportError: cannot import name symbol_database

Seeing that requirements.txt specifies protobuf<=2.5.0, I tried upgrading protobuf to the latest version on pip which is 3.4.0. This update resolved the import issue for import loom.schema_pb2 shown above.

I next tried to validate the install by running make small-test, which unfortunately caused a runtime error (log output).

At this stage I thought to take a step back and get in touch with you about whether you have tried to build Loom on Ubuntu 16.04, and/or guidance regarding the necessary steps for us to do so that we can build and run the software?

fritzo commented 7 years ago

Hi @fsaad first try LOOM_THREADS=1 make small-test to get clearer error messages (this prevents loom from forking out with multiprocessing).

Re: protobuf version, my guess is that symbol_database was moved from proto2 to proto3. Long ago we made a decision to commit the protoc generated code into distributions (this made productionization easier at salesforce). However that generated code is proto2, so you'll need to make protobuf in the distributions package before make installing.

Also, let me know if you want owner status in the github.com/posterior org. I'd be happy to review PRs that remove the generated code from distributions or update the requirements.txt files.

fsaad commented 7 years ago

I made sure to run make protobuf for distributions as part of the installation process (I remember running into that issue some years ago.

I also ran LOOM_THREADS=1 make small-test (log output), it seems the runtime error is being captured as a relatively uninformative python subprocess.CalledProcessError with exit status -11.

I'll be happy to provide the installation procedure for Ubuntu 16.04 and outline the requirements, scripts, etc once I succeed at getting it running. I will aim to submit a PR for removing the protoc artifacts from distributions as well as add make protobuf to make all.

fsaad commented 7 years ago

@fritzo Any idea on where to go next regarding the install on 16.04? I put up the log output of using LOOM_THREADS=1 make small-test in the previous entry, which gave exit -11. Any further guidance of steps forward would be helpful.

fritzo commented 7 years ago

Hi @fsaad I'm not sure what's wrong. Maybe make clean to ensure the loom generated test data was generated with the latest protobuf? Could you share a script to reproduce the error? Something like this:

```sh git clone git@github.com:posterior/distributions git clone git@github.com:posterior/loom (cd distributions ; git checkout fsaad-1604) # Checkout your branch (cd loom ; git checkout fsaad-1604) # Checkout your branch pip freeze protoc --version g++ --version (cd distributions ; make protobuf && make install) (cd loom ; make test) ```
fsaad commented 7 years ago

@fritzo Thanks for the follow-up. I have pushed two Dockerfiles that contain my exact workflow for building and testing Loom on Ubunut 14.04 (successfully) and 16.04 (build OK, runtime test errors) to the branch 20171029-fsaad-docker on my fork https://github.com/fsaad/loom:

https://github.com/fsaad/loom/tree/20171029-fsaad-docker/docker

The README contains the docker commands to build and execute the images. Note that the two Dockerfiles are identical, except for the first line 1 (FROM) and final line 50 (CMD). Let me know if these Dockerfiles provide enough information to help you recover the state, I'm glad to elaborate/iterate.

fritzo commented 7 years ago

Okay, I am able to reproduce locally. Here are backtraces of loom_mix linked against tcmalloc

#0  0x00007ffa371988eb in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#1  0x00007ffa371989ab in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#2  0x00007ffa371a7282 in tc_free () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#3  0x0000563842c45e3d in loom::KindKernel::remove_featureless_kind(unsigned long) ()
#4  0x0000563842c48cd4 in loom::KindKernel::init_featureless_kinds(unsigned long, bool) ()
#5  0x0000563842c4af8e in loom::KindKernel::try_run() ()
#6  0x0000563842bcecdd in loom::Loom::mix(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, char const*) ()
#7  0x0000563842bc8590 in main ()

and standard malloc:

#0  malloc_consolidate (av=av@entry=0x7f3d650eab00 <main_arena>) at malloc.c:4204
#1  0x00007f3d64dad62f in _int_malloc (av=av@entry=0x7f3d650eab00 <main_arena>, bytes=bytes@entry=1728) at malloc.c:3487
#2  0x00007f3d64daf984 in __GI___libc_malloc (bytes=1728) at malloc.c:2927
#3  0x00007f3d658cdaf8 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x0000563973d4316f in std::vector<std::vector<unsigned int, std::allocator<unsigned int> >, std::allocator<std::vector<unsigned int, std::allocator<unsigned int> > > >::_M_default_append(unsigned long) ()
#5  0x0000563973d4021f in loom::ValueSplitter::init(loom::ValueSchema const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned long) ()
#6  0x0000563973da7d27 in loom::KindKernel::init_featureless_kinds(unsigned long, bool) ()
#7  0x0000563973da9f7e in loom::KindKernel::try_run() ()
#8  0x0000563973d2dccd in loom::Loom::mix(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, char const*) ()
#9  0x0000563973d27580 in main ()

suggesting a possible double free error in loom::KindKernel::init_featureless_kinds()...

fritzo commented 7 years ago

Ok @fsaad it looks like this is fixed by https://github.com/posterior/distributions/pull/12 . I'll see if travis-ci tests pass. I think they've been broken for a while, so I may merge anyway. Let me know if you want me to release to PyPI or anything.

fritzo commented 7 years ago

@fsaad After the fix to distributions, there are a couple remaining Python bugs (1) due to moved scipy.misc.imread(), and (2) long missing in loom's type dict (just needs to alias int I believe). I'm happy to review a PR for these, and I've also sent you an invite to posterior org so you should be able to merge a fix.

``` ====================================================================== ERROR: Failure: AttributeError ('module' object has no attribute 'imread') ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/fritz/miniconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName addr.filename, addr.module) File "/home/fritz/miniconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/home/fritz/miniconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/home/fritz/github/posterior/loom/examples/fox/test.py", line 29, in import main File "/home/fritz/github/posterior/loom/examples/fox/main.py", line 58, in IMAGE = scipy.misc.imread(os.path.join(ROOT, 'fox.png')) AttributeError: 'module' object has no attribute 'imread' ====================================================================== ERROR: loom.test.test_query.test_batch_score('dpd-10-10-0.5',) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/fritz/miniconda2/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/fritz/github/posterior/loom/loom/test/util.py", line 62, in test_one fun(**kwargs) File "/home/fritz/github/posterior/loom/loom/test/test_query.py", line 181, in test_batch_score scores = list(server.batch_score(rows)) File "/home/fritz/github/posterior/loom/loom/query.py", line 185, in batch_score self._send_score(row) File "/home/fritz/github/posterior/loom/loom/query.py", line 169, in _send_score data_row_to_protobuf(row, request.score.data) File "/home/fritz/github/posterior/loom/loom/query.py", line 89, in data_row_to_protobuf fields[type(val)].append(val) KeyError: ```
fsaad commented 7 years ago

@fritzo Thanks! I'll work on finishing up the remaining items. (I believe that that scipy.misc.imread needs python-pil as a dependency). Reopening the ticket to help with tracking.

fsaad commented 6 years ago

Build instructions on 16.04 are now given in https://github.com/posterior/loom/pull/9