Open marcianito opened 2 years ago
Here is an update:
I tested the same lines of code on a different machine and they work and fast and finish. Everything looks fine. This is great news and I can test on that machine a little. Anyways, I am still wondering, why it is not running on my personal machine (which would make things way more complicated).
Here are some facts, in case anyone has an idea:
OS: Windows Server 2008 Python version: 3.8.6
OS: Linux manjaro (Arch) Python version: 3.10.5
Does it have to do with the python version? I do not get any errors thrown!
On the other hand, on the windows machine, I get the warning that g++
is not properly configured. But it runs without problems, only with less spead (still quick).
Any help or suggestions for testing? Thanks!
hi,
thank you really much for noting these issues! It's relatively difficult to tell why the different functions (even the pandas function) take so long.
I havn't encountered these issues with the following versions: python = 3.7.10 pymc3 = 3.11.2 pandas = 1.2.3 numpy = 1.20.2 aesara = 2.0.5 arviz =0.11.2 theano-pymc3 = 1.1.2
Do you think it could be due to different pandas versions used?
Do you know where dt_model = dt(series,settings=settings,name='G103')
gets stuck?
Thanks a lot for checking this!
Thanks for your answer!
I will check a setup with your mentioned specs. I suspect it could be due to the pythen version itself. I had couple of software products which were buggy with 3.10.x But within the next days I will come to test and give you feedback.
Regarding your question: I actually stopped investigating at this function in detail. Since I already had issues with the file_reader()
(which I solved), I was afraid to run from one problem into the next one. And so I decided to first ask here ;)
But within the package version testing, I can also check that out and provide feedback!
So..I was busy with different stuff but finally got to testing today.
So I setup a virtual environment with exactly the python and package version you named above.
Output of pip list
:
Package Version
-------------------- --------
aesara 2.0.5
argcomplete 2.0.0
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arviz 0.11.2
attrs 22.1.0
backcall 0.2.0
bleach 5.0.1
cachetools 5.2.0
cffi 1.15.1
cftime 1.6.2
cons 0.4.5
cycler 0.11.0
debugpy 1.6.3
decorator 5.1.1
defusedxml 0.7.1
deprecat 2.1.1
dill 0.3.5.1
discotimes 0.0.1
entrypoints 0.4
etuples 0.3.8
fastjsonschema 2.16.2
fastprogress 1.0.3
filelock 3.8.0
fonttools 4.37.3
greenlet 1.1.3
importlib-metadata 4.12.0
importlib-resources 5.9.0
ipykernel 6.6.1
ipython 7.31.0
ipython-genutils 0.2.0
ipywidgets 7.6.5
jedi 0.18.1
Jinja2 3.1.2
jsonschema 4.16.0
jupyter-client 6.1.0
jupyter-core 4.11.1
jupyterlab-widgets 3.0.3
kiwisolver 1.4.4
logical-unification 0.4.5
MarkupSafe 2.1.1
matplotlib 3.5.3
matplotlib-inline 0.1.6
miniKanren 1.0.3
mistune 0.8.4
msgpack 1.0.4
multipledispatch 0.6.0
nbconvert 5.6.1
nbformat 5.6.0
nest-asyncio 1.5.5
netCDF4 1.6.1
notebook 6.4.12
numpy 1.20.2
packaging 21.3
pandas 1.2.3
pandocfilters 1.5.0
parso 0.8.3
patsy 0.5.2
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.2.0
pip 22.2.2
pkgutil_resolve_name 1.3.10
prometheus-client 0.14.1
prompt-toolkit 3.0.31
ptyprocess 0.7.0
pycparser 2.21
Pygments 2.13.0
pymc3 3.11.2
pynvim 0.4.3
pyparsing 3.0.9
PyQt5 5.15.7
PyQt5-Qt5 5.15.2
PyQt5-sip 12.11.0
pyrsistent 0.18.1
python-dateutil 2.8.2
pytz 2022.2.1
pyzmq 24.0.1
qtconsole 5.2.2
QtPy 2.0.0
scipy 1.7.3
semver 2.13.0
Send2Trash 1.8.0
setuptools 65.3.0
six 1.16.0
terminado 0.15.0
testpath 0.6.0
Theano-PyMC 1.1.2
toolz 0.12.0
tornado 6.2
traitlets 5.4.0
typing-extensions 3.10.0.2
wcwidth 0.2.5
webencodings 0.5.1
widgetsnbextension 3.5.2
wrapt 1.14.1
xarray 0.20.2
xarray-einstats 0.2.2
zipp 3.8.1
The error I get for executing dt_model = dt(series,settings=settings,name='G103')
on the sample data is the following:
In[11]: dt_model = dt(series,settings=settings,name='G103')
Normalize data
Frequency: W
first: 2003-04-06 00:00:00 last: 2020-11-01 00:00:00
/home/mreich/.pyenv/versions/distro/lib/python3.7/site-packages/discotimes/discotimes.py:924: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead.
X_mat=(months == np.repeat(month[np.newaxis,:], 12, axis=0).T)*1
You can find the C code in this temporary file: /tmp/theano_compilation_error_ykks7u26
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/tmp/ipykernel_327972/1300890406.py in <module>
----> 1 dt_model = dt(series,settings=settings,name='G103')
~/.pyenv/versions/distro/lib/python3.7/site-packages/discotimes/discotimes.py in __init__(self, timser, settings, sample_specs, name, testing, test_model)
97 # ! U may not be able to use all of the functions (like plotting) with this option
98 else:
---> 99 self.model = discotimes_model(observed=self.obs,**self.specs)
100 self.trace = None
101 self.compressed=False
~/.pyenv/versions/distro/lib/python3.7/site-packages/pymc3/model.py in __call__(cls, *args, **kwargs)
385 instance = cls.__new__(cls, *args, **kwargs)
386 with instance: # appends context
--> 387 instance.__init__(*args, **kwargs)
388 return instance
389
~/.pyenv/versions/distro/lib/python3.7/site-packages/discotimes/models.py in __init__(self, observed, name, change_trend, n_changepoints, offsets_std, p_, sigma_noise, trend_inc_sigma, annual_cycle, change_offsets, estimate_offset_sigma, estimate_trend_inc_sigma, post_seismic, AR1, distribute_offsets, robust_reg, initial_values, **kwargs)
111
112 # Priors for model parameters
--> 113 offset = pm.Normal('offset', mu=0, sigma=1)
114 trend = pm.Normal('trend', mu=0, sigma=1)
115 sigma = pm.HalfNormal('sigma', sigma=sigma_noise)
~/.pyenv/versions/distro/lib/python3.7/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
122 dist = cls.dist(*args, **kwargs, shape=shape)
123 else:
--> 124 dist = cls.dist(*args, **kwargs)
125 return model.Var(name, dist, data, total_size, dims=dims)
126
~/.pyenv/versions/distro/lib/python3.7/site-packages/pymc3/distributions/distribution.py in dist(cls, *args, **kwargs)
131 def dist(cls, *args, **kwargs):
132 dist = object.__new__(cls)
--> 133 dist.__init__(*args, **kwargs)
134 return dist
135
~/.pyenv/versions/distro/lib/python3.7/site-packages/pymc3/distributions/continuous.py in __init__(self, mu, sigma, tau, sd, **kwargs)
486
487 self.mean = self.median = self.mode = self.mu = mu = tt.as_tensor_variable(floatX(mu))
--> 488 self.variance = 1.0 / self.tau
489
490 assert_negative_support(sigma, "sigma", "Normal")
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/tensor/var.py in __rtruediv__(self, other)
174
175 def __rtruediv__(self, other):
--> 176 return theano.tensor.basic.true_div(other, self)
177
178 def __rfloordiv__(self, other):
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/graph/op.py in __call__(self, *inputs, **kwargs)
251
252 if config.compute_test_value != "off":
--> 253 compute_test_value(node)
254
255 if self.default_output is not None:
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/graph/op.py in compute_test_value(node)
124
125 # Create a thunk that performs the computation
--> 126 thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
127 thunk.inputs = [storage_map[v] for v in node.inputs]
128 thunk.outputs = [storage_map[v] for v in node.outputs]
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/graph/op.py in make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
632 )
633 try:
--> 634 return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
635 except (NotImplementedError, MethodNotDefined):
636 # We requested the c code, so don't catch the error.
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/graph/op.py in make_c_thunk(self, node, storage_map, compute_map, no_recycling)
599 raise NotImplementedError("float16")
600 outputs = cl.make_thunk(
--> 601 input_storage=node_input_storage, output_storage=node_output_storage
602 )
603 thunk, node_input_filters, node_output_filters = outputs
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/link/c/basic.py in make_thunk(self, input_storage, output_storage, storage_map)
1202 init_tasks, tasks = self.get_init_tasks()
1203 cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
-> 1204 input_storage, output_storage, storage_map
1205 )
1206
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/link/c/basic.py in __compile__(self, input_storage, output_storage, storage_map)
1140 input_storage,
1141 output_storage,
-> 1142 storage_map,
1143 )
1144 return (
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/link/c/basic.py in cthunk_factory(self, error_storage, in_storage, out_storage, storage_map)
1632 for node in self.node_order:
1633 node.op.prepare_node(node, storage_map, None, "c")
-> 1634 module = get_module_cache().module_from_key(key=key, lnk=self)
1635
1636 vars = self.inputs + self.outputs + self.orphans
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/link/c/cmodule.py in module_from_key(self, key, lnk)
1189 try:
1190 location = dlimport_workdir(self.dirname)
-> 1191 module = lnk.compile_cmodule(location)
1192 name = module.__file__
1193 assert name.startswith(location)
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/link/c/basic.py in compile_cmodule(self, location)
1548 lib_dirs=self.lib_dirs(),
1549 libs=libs,
-> 1550 preargs=preargs,
1551 )
1552 except Exception as e:
~/.pyenv/versions/distro/lib/python3.7/site-packages/theano/link/c/cmodule.py in compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, py_module, hide_symbols)
2545 compile_stderr = compile_stderr.replace("\n", ". ")
2546 raise Exception(
-> 2547 f"Compilation failed (return status={status}): {compile_stderr}"
2548 )
2549 elif config.cmodule__compilation_warning and compile_stderr:
Exception: ("Compilation failed (return status=1): /usr/bin/ld: /home/mreich/.pyenv/versions/3.7.10/lib/libpython3.7m.a(classobject.o): warning: relocation against `PyInstanceMethod_Type' in read-only section `.text'. /usr/bin/ld: /home/mreich/.pyenv/versions/3.7.10/lib/libpython3.7m.a(longobject.o): relocation R_X86_64_PC32 against symbol `PyExc_OverflowError' can not be used when making a shared object; recompile with -fPIC. /usr/bin/ld: final link failed: bad value. collect2: error: ld returned 1 exit status. ", 'FunctionGraph(Elemwise{true_div,no_inplace}(TensorConstant{1.0}, TensorConstant{1.0}))')
Hopefully this helps you to find a solution.
Cheers
Hi,
ok, it could be an issue related to installing theano and the GCC compiler on windows. I'm probably not facing these issues, because I'm using a linux machine.
There are some threads about this: https://discourse.pymc.io/t/exception-compilation-failed-return-status-1-error-after-installing-pymc3-for-the-first-time/6643 https://stackoverflow.com/questions/38536788/g-error-on-import-of-theano-on-windows-7
You might need to follow these steps to install pymc on windows: https://github.com/pymc-devs/pymc/wiki/Installation-Guide-(Windows)
However, that's a newer version of pymc, which might not be fully compatible with the package. Unfortunately, the code isn't tested on windows yet, once I've done that, I'll adjust the installation guidelines.
Another option would be to try it again on the linux system. What errors did you get on linux?
Hopefully we can make it works somehow, thanks for reporting these issues.
Cheers
Hi,
I think there was a misunderstanding. The recent comment and tests including errors was done on a linux machine (manjaro). I just mentioned the windows test in the very beginning to have an alternative check because i did not know on which system you were developing. I usally only use linux.
So maybe you can have a look into this on your machine?
Cheers
Hi users and developers,
I have a question about computation time when running this software. So far I actually only wanted to work through the tutorial with the example dataset.
I have to admit, that the step
series = file_reader(file,variable='auto',resample='W')
took ages (actually never finished) and digging into the code I found the stepdata['Year']=pd.to_datetime(data['Year']-1970.,unit='Y')
being the reason. This date conversion is super slow or "hangs". I have no explanation.After copy/pasting the code of the
file_reader()
function and manually executing it on the dataset with a workaround of the above mentioned critical line, I could manage to retrive the pd.series.But now in the next step
dt_model = dt(series,settings=settings,name='G103')
again, it takes forever without finishing. I checked with programmhtop
and one core is constantly on 100% load. After several hours it still did not finish. The dataset is actually not really long so I am puzzled what could be the cause.I would really love to test this code and idea but am getting obstacle after obstacle.
Any advice, cross-check, questions and remarks are highly appreciated!
Thanks!