Closed ryanvarley closed 11 years ago
I've been thinking about this quite a bit. I think your solution is actually pretty close to what I've been wanting to do. A couple of remarks:
im trying to use the 'most-correct' or accurate version for the catalogue version
upper and lower limits are good but involve more effort on the part of the inputer to calculate above and below rather than enter directly from the paper. Unless a simple inputer was developed to take this sort of input and output to xml.
I agree the first solution is problematic (and the main reason for this issue). I've been mainly thinking about the entry from a code point of view (i have a python package thats nearly ready that loads the catalogue into classes which includes some more advanced calculations)
Another suggestion is keeping the format as i suggested but using < > to designate the limits in the error or even a upperlimit='true'
Should we use uncertainty as the main tag instead of error or err?
One problem with using something like error='0.1, -0.05' is that it is not trivial to parse with most libraries because it mixes XML with a comma separated list. I think whatever format we choose, it should allow to simply query the uncertainty of a value using XML alone.
I wouldn't worry too much about how much work it is to enter the data. As you mention, one can just write a script that one runs after entering the data in the most convenient format and converting it to the "standardized" format.
So let's see what the options are:
It's clearly a matter of taste. But I prefer 3. I think it is very easy to read (both by humans and by a machine) and it's unambiguous. However, I agree that number 1 is easier to enter. But you could still enter it that way and just write a 5 line python script to convert it to format 3. What do you think?
Or:
4
I like the idea of 3, data entry is harder but its certainly cleaner. i dont however like the difference in syntax between symmetric and non symmetric uncertainties. I also think turning non symmetric uncertainties into upper and lower limits makes it harder to spot data entry errors as its never immediately obvious.
another possibility
but it wouldn't have good handling of upperlimits.
We could also do 3 in a similar fashion eliminating another tag
whilst i dont like the idea of multiple tags having both uncertainty and upperlimit does solve the problems of data entry and validation for many values. I still think id prefer a less tags though, id much rather pull them in with one tag rather than (upperlimit and lowerlimit) or uncertainty.
My previously mentioned code could solve some of these problems on the code end but i think having more tags makes things less universally accessible.
Overall im still unsure.
With regard to your first comment. I agree, it's not nice to have this additional layer of complexity in the syntax by having three attributes error, errorplus and errorminus. In fact I had just made a test entry of KOI-200 and thought the same. Again, one could simply write a script that takes error="0.1" as an input and outputs errorplus="0.1" errorminus="0.1" to make everything consistent while keeping the entry as simple as possible. Or the other way around, going back to the error="0.1" format if the two error bars are found to be equal. Hm. I'm unsure. Let me ask another colleague of mine for his feelings (I think it's mainly down to feelings rather than anything else now).
I guess if one wants to be really precise, one should distinguish somehow between errors and limits. Because they are not really the same. One is a detection, one is a non-detection. A tag such as
I like the second one aswell, this way we have a new tag but it is distinguishing between errors and limits which is useful. Ill also pull in a colleague for more input.
Hi,
I think 5 is the easiest way to get the error bars in a program. You
have always two var instead of three or more if you use "uncertainty"
etc.
Unfortunately, it is a heavy way but whatever the manner, there will
always be a difficult or a heavy part in the chain (xml, code, using).
Cheers,
Marc-Antoine Martinod
Ryan Varley notifications@github.com a écrit :
I like the idea of 3, data entry is harder but its certainly
cleaner. i dont however like the difference in syntax between
symmetric and non symmetric uncertainties. I also think turning non
symmetric uncertainties into upper and lower limits makes it harder
to spot data entry errors as its never immediately obvious.another possibility
1.0
1.0
(symmetric errors)but it wouldn't have good handling of upperlimits.
We could also do 3 in a similar fashion eliminating another tag
1.0
1.0
(symmetric errors)
1.0 (only upper/lower limit known)whilst i dont like the idea of multiple tags having both uncertainty
and upperlimit does solve the problems of data entry and validation
for many values. I still think id prefer a less tags though, id much
rather pull them in with one tag rather than (upperlimit and
lowerlimit) or uncertainty.My previously mentioned code could solve some of these problems on
the code end but i think having more tags makes things less
universally accessible.Overall im still unsure.
Reply to this email directly or view it on GitHub: https://github.com/ryanvarley/open_exoplanet_catalogue_advanced/issues/2#issuecomment-16877712
I talked to Dave Spiegel about it. He brought up another issue, different people define the error bars in a different way (half-width of a Gaussian distribution, dispersion of the posterior distribution, etc). Ideally a flag that indicates which one was used in the paper would be ideal. But maybe this is going too far for the moment.
The good thing is that we seem to agree upon the basic syntax:
For normal error bars:
I added 5 lines of code to the simple cleanup script (currently on a separate branch) that allows you to enter the error as error="1.0" if it is symmetric. It's then converted to errorminus and errorplus attributes.
Let me know if you agree and if you think that's it or if there's more to talk about!
Yes, i think for now the way people define errors isn't for the catalogue to judge and it should just report the best values with the errors given in that paper.
I'll clean up this branch soon with our new standard and edit the wiki.
Should we keep working on this in this repo or on your branch? We are also adding transittime and logg to planets in this branch.
I'm very interested in having the transittime and logg data in my repository too. So, yes, please continue sending me pull requests!
Two questions:
And this issue is closed :-)
The uncertainties quoted in the literature can be in several formats
The issue is how to handle these
Currently im treating them as follows
The method used should be simple but most importantly unambiguous.
Any discussion is welcome