Writing option in vpr.py

JanaKoehler1985 commented 1 year ago

I think it could be a good idea to have a option in vpr.py to store RV values after using -ocen, -oset, and -cen. For a fast look vpr.py is fine. But for the people who would want to do science with the data, they would need again an own script to subtract themself the mean of the orders or using a weighting, like added in the last update. Therefore, it would be easier if the user can directly save the "updated" RV values and use them for further science.

mzechmeister commented 1 year ago

I fully agree.

The option needs a name (out, ofile, ...). Default output name could be vpr.rvo.dat or vpr.<tag>.rvo.dat. I'm open for better proposal.

JanaKoehler1985 commented 1 year ago

Hmm, possible options for a name may would be: write, writeRV, save, store, saveRV, update

In case of the output name, we first would have to think about what to write in the file. Do we just want to write the final RVs (of course together with BJD, berv, errors)? In this case also tag.RV.dat would be an option. Or do we just want to update the combined RVs and leaving the rest? Or do we want update the combined RVs and the RVs of the single orders (in case of using -ocen). I think, I would prefer something like tag.vpr.rvo.dat or tag.vpr.dat over vpr.tag.rvo.dat.

Another thing that I am thinking about, and this is also one of the reasons I came up with this idea, is to also have a return routine/option so it can be called from another script. The reason is, if we store and plot the final RVs at the end of viper.py, the error bars are pretty large. I was just asking myself, if one should consider to use at least the -ocen option in the end before plotting/storing the RVs to have a better idea of the errors.

mzechmeister commented 1 year ago

Ok, I like save, similar to np.savetxt and matplotlib.savefig. I now also tend towards tag.vpr.dat or maybe just tag.dat and not to store again rvo (less space, simplier, less confusing, less repeatitions), just ready for publishing. -ocen should be default for this product. This implies we have now normally three output rvo.dat, .dat and par.dat.

JanaKoehler1985 commented 1 year ago

If we would store just BJD, RV , e_RV it also could be directly used with your gls.py script.

-ocen should be default for this product.

Meaning it also would be called this way at the end of viper.py?

This implies we have now normally three output rvo.dat, .dat and par.dat.

Depends. I would say viper should just create rvo.dat and par.dat (+fits files for CRIRES). And .dat (or how ever we call it) will be just generated by the user if wanted/needed. But than we have three outputs, yes.

mzechmeister commented 1 year ago

Oh, indeed, gls.py cannot handle rvo.dat, the problem is not the number of columns, but the header row. Thus, it should be decided, whether .dat needs a header row. For gls.py, I should consider a skiprows and delimiter option.

I thought, it would be nice, if ocen is called at the end of viper.py by default. It is anyway recommended. An alternative to avoid a new file, is to include those as new columns in rvo.dat:

1 BJD
2 RV   # order centered
3 e_RV
4 RVs  # simple average
5 e_RVs
6 BERV
7 rv1

Drawback: With the current version new spectra could be just appended to rvo.dat, while ocen in rvo.dat would require to recompute second and third column.

I would create just one fits file; if par is wanted, it should be a fits extension.

JanaKoehler1985 commented 1 year ago

I think a header row is always useful. I mean for us as programmers it is clear, but other people are sometimes confused and so we are on the save side.

An alternative to avoid a new file, is to include those as new columns in rvo.dat:

I had the same idea in mind but forgot to mention it. So I would be totally fine with it, even if I don't like your current name suggest (it's is always the same, isn't it ;-) ). I think RV is fine for the center corrected RVs, but RVs is confusing. So it should be at least RV_simp or maybe RV_orig?

Drawback: With the current version new spectra could be just appended to rvo.dat, while ocen in rvo.dat would require to recomputed second and third column.

Yes, the ocen corrected values always have to be computed at the end and than have to be overwritten.

I would create just one fits file; if par is wanted, it should be a fits extension.

Not sure it would work in a nice way, as the structure is different in the par.dat and rvo.dat files in regard of saving values from each order and saving combined RVs. But I still have to think about. Anyhow, as it will be done "outside" the normal viper, it shouldn't matter anyhow.

mzechmeister commented 1 year ago

So then let's go new columns in rvo.dat?

About the name RV_orig and RV_simp also do not fit well. Similar, with other ideas that came to my mind RV_plain, RV_avg, RV_biased, RV_u (uncorrected), RV_noocen.

Another radial approach would be to make ocen completely the default and to waive storing the RV_noocen. Benefits: No duplicated column values in rvo.dat, when ocen is not specified. And we don't have to find a column name and document it, thus reducing confusion. Well, instead, then the option ocen should mutate to its complement, e.g. -noocen or probably best -no-ocen. (Well, since the option is expected to degrade the RV uncertainties, it doesn't a fancy name). However, ocen is a kind of data driven, i.e. it is expected to be derived for many data. So in case there is only one spectrum, the order scatter will be zero after ocen, implying zero error for the epoch. This reminds that the -ocen theoretically underestimates the uncertainty; the bias vanishes for more data (similar, one has to consider debiasing e.g. in np.std via ddof).

Going in cycles, it somehow also argues for a new file to separate the different product "levels" (ocen is a kind of post processing).

JanaKoehler1985 commented 1 year ago

It is hard to say, what is better- to store it in the same file or a separated one. It both has its pro and cons. Though, we are discussing here mainly about RV changes caused by -ocen. Nevertheless, I think it would be good if the user also can use the -oset as well to exclude bad orders. In this case it would make more sense, to store the updated RVs in a new file.

Another thing that came into my mind while thinking about putting it in the same file- Shouldn't in theory -ocen doesn't change the final RVs (or maybe just add an offset) but should just have an influence on the errors? Or am I totally wrong here? Well when usingvpr.py it gives different results, even so I think it shouldn't. This is even the case when using -avg mean for both cases.

mzechmeister commented 1 year ago

Right, I forgot -oset. So let's go for a new file, where -ocen is default (?).

-ocen does not change the RVs, if it is done without uncertainties. When including uncertainties, more thinking is needed. Currently, just a simple order mean is subtracted https://github.com/mzechmeister/viper/blob/d79230a4c4d3daa5cb1bb0662c75fd5e065139a3/vpr.py#L126 Changing to a weighted mean is logically, and probably maintains the property of not changing the final RV.

BTW, this is only the beginning of post-processing echelle RV. Reweighting/clipping and/or robust statistic can enhancements (e.g. Eq. 15 in https://ui.adsabs.harvard.edu/abs/2012ApJS..200...15A, Eq. 10 in https://ui.adsabs.harvard.edu/abs/2019AJ....158..164B). -oset is basically a manual clipping (zero weight).

JanaKoehler1985 commented 1 year ago

So let's go for a new file, where -ocen is default (?).

Well, that is the question (not so much the creation of a new file). There are at least two options: 1) Creation of a new RV file, just if the user runs vpr.py with selected option (and ocen not as default). Otherwise just store normal rvo.dat file at the end fo viper. 2) Use ocen option at the end of viper.py and overwrite e_RV in rvo.dat. Beside a new RV file can be generated using vpr.py with selected options. (I would like that, but I guess you are against it) or 3) Automatically run vpr.py at the end of viper to create an additional RV file to rvo.dat using ocen as default.

-ocen does not change the RVs, if it is done without uncertainties.

That is true but not the case for the plotted RVs in our current version of vpr.py - at least not if inf appears in the errors of some orders. This is due to the fact that vpr is using the RVs directly from the rvo.dat input file if no -ocen, while in case of -ocen the RVs are newly calculated:

https://github.com/mzechmeister/viper/blob/d79230a4c4d3daa5cb1bb0662c75fd5e065139a3/vpr.py#L260-L265

And as in viper.py the RVs are calculated by excluding all RVs with e_RV == inf, while this is not done invpr.py the plotted results can vary.

https://github.com/mzechmeister/viper/blob/d79230a4c4d3daa5cb1bb0662c75fd5e065139a3/viper.py#L811-L817

Or in other words, the RV calculation in viper.py and vpr.py is different.

mzechmeister commented 1 year ago

So, I vote for

Automatically run vpr.py at the end of viper to create an additional RV file to rvo.dat using ocen as default.

Or in other words, the RV calculation in viper.py and vpr.py is different.

Ok, that is because of the egde case inf. The line in viper.py goes back to: https://github.com/mzechmeister/viper/blob/25ec38aed7b1c5b13a0b313173d4c8606045fecb/viper.py#L171 where viper.py was shorter than 200 lines and vpr.py didn't exist. So the computation could be should be made the same, ideally in one place, which would be vpr.py. But vpr.py requires already a file. Sound like Chicken-and-egg dilemma and refactoring vpr.py to also take directly rv an e_rv but this is then another issue.

JanaKoehler1985 commented 1 year ago

Yeah, it definitely should give the same results in both cases. Otherwise it can be confusing for the users.

refactoring vpr.py to also take directly rv an e_rv but this is then another issue.

Or we write out the rvo.dat in viper as we are already doing it and then overwriting the RV column with the "right" RVs as estimated in vpr.py. This anyhow would also be a suitable option- that we offer both options - overwriting the RV column OR creating a new file with re-calculated RVs. Something like -save new and -save update (just thinking loud).

mzechmeister commented 1 year ago

I would not overwrite the columns anymore. save should take a filename (possibly autocomplete extension ".dat").

mzechmeister commented 1 year ago

I needed it now urgently. So I went ahead. I guess, it is not final yet.

The gui needs the option, too. The header starts with a hash (# BJD RV e_RV); it's numpy default. Discussable, too. E.g. gnuplot ignores it; pl ... w e just works, while named columns (e.g. pl ... us 'RV') are not recognised.

JanaKoehler1985 commented 1 year ago

It's ok. And good for testing to see if more changes are needed.

I will update the GUI when I am back.

I don't understand the last part.

JanaKoehler1985 commented 1 year ago

I am just thinking about where to put the saving option in the GUI best. It doesn't fit into the plotting options, but also not really together with the read in. Maybe a new additional window?

mzechmeister commented 1 year ago

Indeed, not so easy. I would expect it close to Plot o-rv or on the same height as EXIT or in a menu File>Save as. But the GUI maybe need also some general reorganisation, tbd.

JanaKoehler1985 commented 1 year ago

Maybe it is also a good idea to continue this discussion next week. I totally agree that the layout is not perfect yet. It was just a first version to start somewhere. But I guess it is easier to discuss possible changes in person.

mzechmeister / viper

Writing option in vpr.py #39