Closed fsaad closed 6 years ago
Hi @fsaad first try LOOM_THREADS=1 make small-test
to get clearer error messages (this prevents loom from forking out with multiprocessing).
Re: protobuf version, my guess is that symbol_database
was moved from proto2 to proto3. Long ago we made a decision to commit the protoc
generated code into distributions (this made productionization easier at salesforce). However that generated code is proto2, so you'll need to make protobuf
in the distributions package before make install
ing.
Also, let me know if you want owner status in the github.com/posterior
org. I'd be happy to review PRs that remove the generated code from distributions or update the requirements.txt
files.
I made sure to run make protobuf
for distributions as part of the installation process (I remember running into that issue some years ago.
I also ran LOOM_THREADS=1 make small-test
(log output), it seems the runtime error is being captured as a relatively uninformative python subprocess.CalledProcessError
with exit status -11.
I'll be happy to provide the installation procedure for Ubuntu 16.04 and outline the requirements, scripts, etc once I succeed at getting it running. I will aim to submit a PR for removing the protoc artifacts from distributions as well as add make protobuf
to make all
.
@fritzo Any idea on where to go next regarding the install on 16.04? I put up the log output of using LOOM_THREADS=1 make small-test
in the previous entry, which gave exit -11. Any further guidance of steps forward would be helpful.
Hi @fsaad I'm not sure what's wrong. Maybe make clean
to ensure the loom generated test data was generated with the latest protobuf? Could you share a script to reproduce the error? Something like this:
@fritzo Thanks for the follow-up. I have pushed two Dockerfiles that contain my exact workflow for building and testing Loom on Ubunut 14.04 (successfully) and 16.04 (build OK, runtime test errors) to the branch 20171029-fsaad-docker
on my fork https://github.com/fsaad/loom
:
https://github.com/fsaad/loom/tree/20171029-fsaad-docker/docker
The README contains the docker commands to build and execute the images. Note that the two Dockerfiles are identical, except for the first line 1 (FROM) and final line 50 (CMD). Let me know if these Dockerfiles provide enough information to help you recover the state, I'm glad to elaborate/iterate.
Okay, I am able to reproduce locally. Here are backtraces of loom_mix
linked against tcmalloc
#0 0x00007ffa371988eb in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#1 0x00007ffa371989ab in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#2 0x00007ffa371a7282 in tc_free () from /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
#3 0x0000563842c45e3d in loom::KindKernel::remove_featureless_kind(unsigned long) ()
#4 0x0000563842c48cd4 in loom::KindKernel::init_featureless_kinds(unsigned long, bool) ()
#5 0x0000563842c4af8e in loom::KindKernel::try_run() ()
#6 0x0000563842bcecdd in loom::Loom::mix(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, char const*) ()
#7 0x0000563842bc8590 in main ()
and standard malloc
:
#0 malloc_consolidate (av=av@entry=0x7f3d650eab00 <main_arena>) at malloc.c:4204
#1 0x00007f3d64dad62f in _int_malloc (av=av@entry=0x7f3d650eab00 <main_arena>, bytes=bytes@entry=1728) at malloc.c:3487
#2 0x00007f3d64daf984 in __GI___libc_malloc (bytes=1728) at malloc.c:2927
#3 0x00007f3d658cdaf8 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x0000563973d4316f in std::vector<std::vector<unsigned int, std::allocator<unsigned int> >, std::allocator<std::vector<unsigned int, std::allocator<unsigned int> > > >::_M_default_append(unsigned long) ()
#5 0x0000563973d4021f in loom::ValueSplitter::init(loom::ValueSchema const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned long) ()
#6 0x0000563973da7d27 in loom::KindKernel::init_featureless_kinds(unsigned long, bool) ()
#7 0x0000563973da9f7e in loom::KindKernel::try_run() ()
#8 0x0000563973d2dccd in loom::Loom::mix(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, char const*) ()
#9 0x0000563973d27580 in main ()
suggesting a possible double free error in loom::KindKernel::init_featureless_kinds()
...
Ok @fsaad it looks like this is fixed by https://github.com/posterior/distributions/pull/12 . I'll see if travis-ci tests pass. I think they've been broken for a while, so I may merge anyway. Let me know if you want me to release to PyPI or anything.
@fsaad After the fix to distributions, there are a couple remaining Python bugs (1) due to moved scipy.misc.imread()
, and (2) long
missing in loom's type dict (just needs to alias int
I believe). I'm happy to review a PR for these, and I've also sent you an invite to posterior
org so you should be able to merge a fix.
@fritzo Thanks! I'll work on finishing up the remaining items. (I believe that that scipy.misc.imread needs python-pil as a dependency). Reopening the ticket to help with tracking.
Build instructions on 16.04 are now given in https://github.com/posterior/loom/pull/9
Hi @fritzo,
We have recently finished writing an integration layer between loom and bayeslite, which we are quite excited to start exploring on a variety of real-world datasets. Our goal is to get to a stage where Loom is available as one of the default inference backends for bayeslite.
We have currently tested the integration on Ubuntu 14.04, and as far as we can tell the basic functionalities are all working. We have run python-based tests (via the bayeslite test suite), as well as end-user workflows that use Loom (via bayeslite) to replicate previous analyses from cgpm on (i) a satellites data set, and (ii) macroeconomic variables from the Gapminder Foundation. The results from the Loom backend are quite validating and encouraging.
The final stage of our integration is to build Loom on Ubuntu 16.04, which is the target platform for bayeslite/cgpm and other software on the probcomp stack, and I am wondering what it would take to complete this task. Successfully completing this task would allow us to install Loom as part of the probcomp stack, which is a dockerfile that bundles several software packages together.
I am at a stage where both distributions and loom do build on Ubuntu 16.04, by following the installation instructions from the .travis.yml.
However, while the software technically builds to completion, there are some runtime errors that arise from protobuf and the auto-generated file
schema_pb2.py
. Namely, importingloom.schema_pb2
fails:Seeing that
requirements.txt
specifiesprotobuf<=2.5.0
, I tried upgrading protobuf to the latest version on pip which is 3.4.0. This update resolved the import issue forimport loom.schema_pb2
shown above.I next tried to validate the install by running
make small-test
, which unfortunately caused a runtime error (log output).At this stage I thought to take a step back and get in touch with you about whether you have tried to build Loom on Ubuntu 16.04, and/or guidance regarding the necessary steps for us to do so that we can build and run the software?