hippke / tls

Transit Least Squares: An optimized transit-fitting algorithm to search for periodic transits of small planets
MIT License
48 stars 24 forks source link

Occasional buggy spectrum if array of equal errors given #110

Open leigh2 opened 6 months ago

leigh2 commented 6 months ago

I found that if TLS is provided an array of uniform errors then it will sometimes/often cause issues with the power spectrum, and hence (automatic) period selection. I ran a few hundred tests of white noise models with transits and found that in most cases the short period end of the spectrum had erroneous massive peaks and troughs. Peaks naturally caused an incorrect selection of the period. I've managed to reproduce this from your quick start tutorial notebook and pushed it to my fork here https://github.com/leigh2/tls/blob/error_array_bug_example/tutorials/01%20Quick%20start%20with%20synthetic%20data.ipynb (Note that seeds 0 and 2 are fine, indicating that it only occasionally breaks!)

I've included examples of a couple of other power spectrum plots below at different levels of broken-ness.

This was using the current main branch version, on linux, python 3.11.5 fwiw.

34ppmhr 160ppmhr

talensgj commented 4 weeks ago

I believe I've found the cause of this issue. If I'm right it has nothing to do with using an array of uniform errors, instead it is caused by the chi2 value used when no models are fit.

The function lowest_residuals_in_this_duration uses the number of datapoints in the lightcurve as the chi2 value when no models are fit. This is of course the expected value of the weighted sum of residuals, assuming the errors are Gaussian and the provided values are correct, but in reality the actual value of sum (y - 1)^2/dy^2 will not be exactly equal to the number of points. A factor that greatly exacerbates this is the re-scaling of user provided uncertainties to have a mean of 1, which completely breaks this assumption.

Most of the time this is not an issue since the starting value will be replaced anyway, but especially at shorter periods it is possible that none of the trial depths exceed transit_depth_min = 10 ppm and this wildy incorrect default value is returned.

I tested this assumption by replacing the assumed value of len(y) with the true value of sum (y - 1)^2/dy^2 in my fork, and was able to make the spurious peaks and throughs go away in the lightcurve I was testing. I'm attaching to figures below comparing TLS to the astropy implementation of BLS. In the first figure I'm using the current version of TLS, the other uses my fork with a fix for this issue.

TLS 1.32: TLS_comparison_1

My fix: TLS_comparison_2

hippke commented 4 weeks ago

Hey, thank you for the bug report @leigh2 and the fix @talensgj ! I have come across this issue in the past myself but never had the time to dive deeper. Can you submit a pull request @talensgj or alternatively copy-paste the lines of code you changed? I will then publish a new version. Thank you so much!

talensgj commented 3 weeks ago

Hello, I've made the pull request #112 . Glad I could help!