Open versar opened 6 years ago
@versar Loom is designed to store auxiliary data files for each generator on disk ; these files are separate from the .bdb file and are stored in the loom_store_path
that you specify when registering the backend:
https://github.com/probcomp/bayeslite/blob/0d80bf21b803030ae7de8ab84e377b84b6db1868/src/metamodels/loom_metamodel.py#L125-L130
The stack traces you pasted, in particular the error: IOError: [Errno 2] No such file or directory: u'/scratch/versar/glove-loom/resources/20171203-050719_glove_loom/ingest/encoding.json.gz'
, suggests Loom could not find the data directory.
It is possible that your client code, e.g. ipython notebook, has a relative path for loom_store
, which will break if the notebook and .bdb files are moved to another directory. If this is indeed what happened, then there are many candidate solutions to this problem:
loom_store_path
(when working on a single filesystem).loom_store_path
in the notebook client (necessary when migrating across file systems).The reason that ESTIMATE DEPENDENCE and ESTIMATE SIMILARITY do not have an error is that the data structures needed to answer these queries live in the .bdb file, so we do not need to resort to the Loom-specific directories (which are needed for analyze and simulate).
If you have a reproducible state and continue to experience issues, then I can help you debug further.
@fsaad Thanks for the suggestions. If I am understanding correctly that loom_store_path
is set as an argument to LoomBackend
like this:
bayesdb_register_backend(bdb, LoomBackend('/scratch/versar/projects/prostab_accuracy_testing/loom_files/'))
then I am using an absolute path for loom_store_path
. I have experienced the error again last week, so next time, I will try to capture a reproducible state.
On multiple occasions, I have built a CrossCat model from some data using the Loom backend, then successfully performed a variety of queries. Suddenly, I experience an error where bayeslite seems to forget that the model has been trained using
ANALYZE
. Errors occur when I attempt to perform some commands (likeSIMULATE
and moreANALYZE
) but not others (likeESTIMATE DEPENDENCE
andESTIMATE SIMILARITY
). This is quite frustrating, especially afterANALYZE
was already run for a long time and the CrossCat model looked fine on multiple queries. The only effective solution when I have encountered this bug has been to delete the .bdb file and start from scratch.Error with ANALYZE To demonstrate that the generator exists for the purpose of this issue report, I tried
%mml INITIALIZE 16 MODELS FOR <generator>
. In the notebook I was using,<generator> = 'glove_loom'
. As expected when the generator exists, I get the errorGenerator 'glove_loom' already has models: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
. I had already been working with this .bdb file for multiple days using a number of different queries, after training the generator on a large data set usingANALYZE
.When I tried
ANALYZE
after verifying that the generator exists as above, I got an error. The command I tried was%mml ANALYZE "glove_loom" FOR 30 ITERATION WAIT;
and it failed. The stack trace outputted to the notebook cell is attached here. stack_trace_ANALZYE.txtError with SIMULATE I tried the command
%bql SIMULATE <variable> FROM <population> LIMIT 1
. This returned an error message'NoneType' object has no attribute '_predict'
. Note: this SIMULATE query is the first time in my workflow that I noticed this set of errors. I attached here a stack trace from performing the same SIMULATE query through the bayeslite api asbdb.execute('SIMULATE <variable> FROM <population> LIMIT 1').fetchall()
. In this case,<variable> = 'glove_vector_90'
and<population> = 'glove'
. This is in the same notebook producing the ANALYZE error above. stack_trace_simulate.txtNo error with ESTIMATE DEPENDENCE In the same notebook producing the errors above, the following works as expected:
ESTIMATE DEPENDENCE PROBABILITY FROM PAIRWISE VARIABLES OF glove MODELED BY glove_loom
No error with ESTIMATE SIMILARITY In the same notebook producing the errors above, the following works as expected:
ESTIMATE SIMILARITY IN THE CONTEXT OF "glove_vector_247" FROM PAIRWISE glove MODELED BY glove_loom