Closed warrickball closed 4 years ago
Thanks @warrickball - I think this all comes from the normalisation. We should put an if statement around the LC prep that doesn't normalise if LC.flux.mean() < tol.
This particular aspect of LightKurve is being overhauled at the moment. Should be fixed by requiring PBjam updates to the latest version of LightKurve (in the near future).
This should be fixed now with v1.2.0. We should try it out, I'm not sure if behaviour will be consistent.
Okay so having played around with it, to retain periodogram
units of ppm^2/microHz
, we need to call:
lc.normalize(unit='ppm').to_periodogram(normalization='psd')
whereas before we simply called:
lc.normalize().to_periodogram(normalization='psd')
.
I think this should work for zero-centered data as well, but I haven't tried that yet.
Thanks @ojhall94 - we should update the dependency to be lightcurve==1.2.0 and then put in the updated unit in normalize. @alexlyttle - do you want to have a swing at this one?
I could give that a go, will update lightkurve and this out on my fork.
@alexlyttle - do you know about the dependencies in setup.py? If no, let's talk tomorrow.
@grd349 I think so, I believe it's just a case of changing lightkurve>=1.0.1
to lightkurve>=1.2.0
in setup.py to make sure that the version of lightkurve installed is greater than or equal to 1.2.0.
If you guys are going to be playing with the dependencies in setup.py, can you make it pull the dependencies from requirements.txt? Then we only need to update the dependencies in one place.
It could maybe be done with an easy open('requirements.txt', 'r').read() call in the setup.py, but I'm not sure.
The requirements.txt is what sphinx/readthedocs uses to make compile everything for the docs page.
On Thu, 3 Oct 2019 at 11:50, Guy Davies notifications@github.com wrote:
@alexlyttle https://github.com/alexlyttle - do you know about the dependencies in setup.py? If no, let's talk tomorrow.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grd349/PBjam/issues/159?email_source=notifications&email_token=AEJWO324MVSDVIRFUKCSV7TQMXE6RA5CNFSM4ITPFIR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAHZTCY#issuecomment-537893259, or mute the thread https://github.com/notifications/unsubscribe-auth/AEJWO34B5LKNTLZWA2DMI7DQMXE6RANCNFSM4ITPFIRQ .
@nielsenmb that makes sense, I can also try that on my fork and see what happens. Will chat with @grd349 tomorrow about it too.
There appears to be a bug in the lightkurve normalize function, so you get the wrong scale of the offset on the flux when using normalize('ppm'). Basically it looks like they've forgotten to subtract 1 when converting from flux to ppm, so you wind up with a mean of 1e6, rather than 0 after converting.
It might still work for our purposes though, since to_periodogram should handle offsets in the input time series.
Just so you're aware
@nielsenmb thanks for the heads-up!
@nielsenmb I modified the normalize()
function in Lightkurve so that it takes away 1 from the flux before converting to percent, ppt and ppm. I intend to create a pull request/issue on lightkurve about this, but want to mention some of the side-effects of the change first.
For example, I noticed flattening the light curve before/after normalisation massively changes the appearance of the light curve. This could be because of the behaviour of lightcurve.flatten()
near zero. See the attached pdf for plots (using the example star in PBjam docs):
lightkurve-test.pdf
This might be more appropriate in a separate Lightkurve issue but I thought it would be useful here too.
I tried running the script in the first comment of this thread and it throws up an error when trying to initialise session
. @warrickball, do you know which version of PBjam you ran this on? I was looking back to see if there was a change in the code somewhere that is causing this, but it is pretty difficult to pinpoint so I will leave it here for now.
Here is the error which arises after running the script, using PBjam version 0.1.6 on MacOS 10.15:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'timeseries'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/internals/managers.py in set(self, item, value)
1068 try:
-> 1069 loc = self.items.get_loc(item)
1070 except KeyError:
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'timeseries'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-2-ebe582da4175> in <module>
1 s = pb.session(ID='0', numax=(80.0, 5.0), dnu=(6.3, 0.2),
2 teff=(5000, 50), bp_rp=(1.0, 0.01),
----> 3 timeseries=timeseries)
~/Projects/PBjam/pbjam/session.py in __init__(self, ID, numax, dnu, teff, bp_rp, timeseries, psd, dictlike, use_cached, cadence, campaign, sector, month, quarter, path, download_dir)
620 campaign=campaign, sector=sector,
621 month=month, quarter=quarter)
--> 622 format_col(vardf, timeseries, 'timeseries')
623 format_col(vardf, psd, 'psd')
624
~/Projects/PBjam/pbjam/session.py in format_col(vardf, col, key)
357 x = np.array(col[0, :], dtype=float)
358 y = np.array(col[1, :], dtype=float)
--> 359 vardf[key] = np.array([arr_to_lk(x, y, vardf['ID'][0], key)])
360
361 # If dim = 3, it's a list of arrays or tuples
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/frame.py in __setitem__(self, key, value)
3470 else:
3471 # set column
-> 3472 self._set_item(key, value)
3473
3474 def _setitem_slice(self, key, value):
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/frame.py in _set_item(self, key, value)
3548 self._ensure_valid_index(value)
3549 value = self._sanitize_column(key, value)
-> 3550 NDFrame._set_item(self, key, value)
3551
3552 # check if we are modifying a copy
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/generic.py in _set_item(self, key, value)
3379
3380 def _set_item(self, key, value):
-> 3381 self._data.set(key, value)
3382 self._clear_item_cache()
3383
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/internals/managers.py in set(self, item, value)
1070 except KeyError:
1071 # This item wasn't present, just insert at end
-> 1072 self.insert(len(self.items), item, value)
1073 return
1074
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/internals/managers.py in insert(self, loc, item, value, allow_duplicates)
1179 new_axis = self.items.insert(loc, item)
1180
-> 1181 block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
1182
1183 for blkno, count in _fast_count_smallints(self._blknos[loc:]):
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype, fastpath)
3265 values = DatetimeArray._simple_new(values, dtype=dtype)
3266
-> 3267 return klass(values, ndim=ndim, placement=placement)
3268
3269
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
2773 values = np.array(values, dtype=object)
2774
-> 2775 super().__init__(values, ndim=ndim, placement=placement)
2776
2777 @property
~/.virtualenvs/pbjam/lib/python3.7/site-packages/pandas-0.25.1-py3.7-macosx-10.14-x86_64.egg/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
126 raise ValueError(
127 "Wrong number of items passed {val}, placement implies "
--> 128 "{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
129 )
130
ValueError: Wrong number of items passed 1000, placement implies 1```
Getting the same error as Alex, investigating...
Seems like the offending line is in session.py line
359 vardf[key] = np.array([arr_to_lk(x, y, vardf['ID'][0], key)])
For some obscure reason when the output from arr_to_lk is put in an array, it messes up something with creating a new column in a PandasDataframe.
Replacing it with
vardf[key] = [arr_to_lk(x, y, vardf['ID'][0], key)]
seems to have solved this for me...
@alexlyttle can you try changing this and seeing if it works on your end? Maybe test it on a real time series too. Feel free to include it in pull request #171 if it does. Then I'll merge it.
That's odd, but if that fixes it then it should be fine. I'll give that a go and add it to the pull request if it works.
I don't think I've updated anything since opening this issue and I still get the same error. I seem to have PBjam version 0.1.0:
$ pip3 show pbjam
Name: pbjam
Version: 0.1.0
...
Specifically, I've got commit 404d2edb1023a33bdeaa2ceb4543414e3a8bbace:
git log | head
commit 404d2edb1023a33bdeaa2ceb4543414e3a8bbace
Author: grd349 <grd349@gmail.com>
Date: Thu Aug 22 17:03:47 2019 +0100
Various updates to prior and stage2 checking
...
I won't try upgrading yet in case it makes it hard to reproduce the error.
Cheers @warrickball. In the latest version of PBjam on git and with lightkurve v1.2.0, after adding to my fork the fix suggested by @nielsenmb, I don't get the lightkurve error you had, or the pandas KeyError I quoted.
Now I just get this warning upon initialising session:
UserWarning: Input numax is greater than Nyquist frequeny for 0
but otherwise it initialises fine. Then, when I call session it complains that the prior is very large and may be slow - presumably because the custom timeseries is random so it's going to struggle?
Yup, the numax warning is probably cause you are using a fake time series. You can try using a real timeseries.
The warning about the slow KDE occurs when the prior data it's being trained on contains more than 1000 targets. This probably occurs because of some combination of large errors on numax and dnu and that you are in a particularly dense part of the prior sample.
On Thu, 10 Oct 2019 at 13:52, Alex Lyttle notifications@github.com wrote:
Cheers @warrickball https://github.com/warrickball. In the latest version of PBjam on git and with lightkurve v1.2.0, after adding to my fork the fix https://github.com/grd349/PBjam/issues/159#issuecomment-540424072 suggested by @nielsenmb https://github.com/nielsenmb, I don't get the lightkurve error you had, or the pandas KeyError I quoted https://github.com/grd349/PBjam/issues/159#issuecomment-540093168.
Now I just get this warning upon initialising session:
UserWarning: Input numax is greater than Nyquist frequeny for 0
but otherwise it initialises fine. Then, when I call session it complains that the prior is very large and may be slow - presumably because the custom timeseries is random so it's going to struggle?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grd349/PBjam/issues/159?email_source=notifications&email_token=AEJWO345DVGQBKDCTJ4WBMTQN33PVA5CNFSM4ITPFIR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA3ULJA#issuecomment-540493220, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJWO37B4JZS2DMAEMJYC6DQN33PVANCNFSM4ITPFIRQ .
It works fine using a real time series, no warnings and no issues after implementing your suggested fix @nielsenmb.
Cool, feel free to commit the changes to your pull request, and I'll merge it into the master branch
On Thu, 10 Oct 2019 at 14:30, Alex Lyttle notifications@github.com wrote:
It works fine using a real time series, no warnings and no issues after implementing your suggested fix @nielsenmb https://github.com/nielsenmb.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grd349/PBjam/issues/159?email_source=notifications&email_token=AEJWO365FVTL5MKU6ZYED7DQN3727A5CNFSM4ITPFIR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA3XXSQ#issuecomment-540507082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJWO3ZWCFNGINNU4ZEJI7DQN3727ANCNFSM4ITPFIRQ .
I upgraded yesterday afternoon so now have version 0.1.6 (and lightkurve 1.2.0). The same script seems to run to completion with the following output, which may or may not be worrying:
/home/wball/pypi/PBjam/pbjam/session.py:645: UserWarning: Input numax is greater than Nyquist frequeny for 0
warnings.warn("Input numax is greater than Nyquist frequeny for %s" % (st.ID))
/home/wball/pypi/PBjam/pbjam/priors.py:66: UserWarning: You have lots data points in your prior - estimating the KDE band width will be slow!
warnings.warn('You have lots data points in your prior - estimating' +
72%|████████████████████████████████████████████▏ | 14496/20000 [06:08<02:15, 40.61it/s]Converged after 14500 iterations.
72%|████████████████████████████████████████████▏ | 14500/20000 [06:08<02:19, 39.32it/s]
100%|█████████████████████████████████████████████████████████████| 20000/20000 [11:19<00:00, 29.42it/s]
Star 0 produced an exception of type ValueError occurred. Arguments:
('min() arg is an empty sequence',)
I've run it twice and got the same result so it probably isn't random.
Hi,
The following snippet
causes this error in lightkurve:
This seems to be because lightkurve insists on normalising the light curve by dividing by the median, which presumably leads to non-finite values that lightkurve's error describes as NaNs (even though I'm pretty sure the problem is Infs, not NaNs). I noticed that, when the timeseries is provided as a filename, PBjam handles this (in
jar.lc_to_lk()
) by adding a small offset but this doesn't seem to apply if the timeseries is provided as an array.