Missing ")" at line 537

andreok commented 3 months ago

Hello,

Thank you very much for sharing this very insteresting package of code.

I believe there's a missing ")" at the end of line 537. I also believe you forgot to mention the ray package (and maybe the zeus sampler?) in the instructions.

But I'm geting the following error message after those adjustments:

[/content/stis_pipeline/stis_pipeline/STIS_pipeline_functions.py](https://localhost:8080/#) in get_data(files, dq, jit, keep_first_orbit)
    155         # sort jitter vectors
    156         jitter_dict = {}
--> 157         for i in range(len(jitter_vector_list)):
    158             jitter_dict[jitter_vector_list[i]] = [item[i] for item in jitter_hold]
    159 

UnboundLocalError: local variable 'jitter_vector_list' referenced before assignment

Do I need pass both FLT and JIT files to get_data()?

Thanks

natalieallen commented 3 months ago

Hello!

Thanks so much for pointing that out -- I did indeed accidentally delete that ")" while I was cleaning up the code. I will update that. ray is required by the transitspectroscopy package I believe, but does not by default install when pip installing transitspectroscopy, and zeus is a potential sampler in juliet, but is not needed (and I have not implemented it as a sampling into my light curve fitting function).

For the get_data function, you do not need to pass both the .flt and .jit files, but it does assume that the .jit file corresponding to each .flt file is in the same location as the .flt file. Can you check to see if this is true for you?

andreok commented 3 months ago

Hello Natalie,

Thank you very much for the information.

I believe my issue with the get_data was related to an observation where there were only one .jit (and .flt) available. I switched now to another dataset having multiple observations and .jit files and that seemed to work just fine.

Unfortunately, I am facing difficulties trying to do the step where you break the orbits manually to clean the data. Could you provide further details on how you do it?

Thanks, Andre

natalieallen commented 3 months ago

The step where the total set of observations are broken up into smaller groups of exposures (typically 3 or 4) are done for the "difference image" step of the reduction. You can do up to the entire orbit together in this step (though I would make sure not to include exposures from different orbits to be safe), but this will increase the time spent for the difference image step, as the difference between a single exposure will be taken with every other exposure in the set. I've found through testing that in some datasets, the systematics change so much even in the course of a single orbit that smaller subsets work better for this step, but that may be something you want to test with your own data.

Does that help with what you were unsure about, or is there something else?

andreok commented 3 months ago

Hello Natalie,

Thank you very much for the explanation.

The idea is to use the STIS pipeline to reduce the observations from David Sing for WASP-52b. I believe there're three visits in this dataset (two for G430L and one for G750L).

As I undertand it, I would have to have three different large groups (3x transitcleaned), one for each visit in this dataset (two for G430L and one for G750L). And then clean the data by sub-groupings per orbit, where the first orbit must contain two cleaned segments and the others three cleaned_ segments, with their indexes incrementing by 3 or 4 up until the length of each orbit? So, it should look like this for the orbit lenght array [0, 8, 0, 10, 0, 10, 0, 10]?

cleaned_1 = clean_data(data[0][:4], dqs = data[3][:4], traces = trace_fit[:4])
cleaned_2 = clean_data(data[0][4:8], dqs = data[3][4:8], traces = trace_fit[4:8])

cleaned_3 = clean_data(data[0][8:11], dqs = data[3][8:11], traces = trace_fit[8:11])
cleaned_4 = clean_data(data[0][11:15], dqs = data[3][11:15], traces = trace_fit[11:15])
cleaned_5 = clean_data(data[0][15:18], dqs = data[3][15:18], traces = trace_fit[15:18])

cleaned_6 = clean_data(data[0][18:21], dqs = data[3][18:21], traces = trace_fit[18:21])
cleaned_7 = clean_data(data[0][21:25], dqs = data[3][21:25], traces = trace_fit[21:25])
cleaned_8 = clean_data(data[0][25:28], dqs = data[3][25:28], traces = trace_fit[25:28])

cleaned_9 = clean_data(data[0][28:31], dqs = data[3][28:31], traces = trace_fit[28:31])
cleaned_10 = clean_data(data[0][31:35], dqs = data[3][31:35], traces = trace_fit[31:35])
cleaned_11 = clean_data(data[0][35:38], dqs = data[3][35:38], traces = trace_fit[35:38])

transit_cleaned = [*cleaned_1, *cleaned_2, *cleaned_3, *cleaned_4, *cleaned_5, *cleaned_6, *cleaned_7, *cleaned_8, *cleaned_9, *cleaned_10, *cleaned_11]

PS: The original issue was for the visit from Kevin France, also for WASP-52b, that has only one flt file for G430L.

Thanks, Andre

andreok commented 3 months ago

Hello Natalie,

I tried the break down of the orbits into sub groups as per my previous message, but I'm getting the following error message:

Starting dq_correct.

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

[<ipython-input-21-23709410fde3>](https://localhost:8080/#) in <cell line: 8>()
      6 # the spline function is a little broken at the moment, but if we use optimal extraction it doesn't matter
      7 # if we don't use it, so we turn it off here
----> 8 cleaned_1 = clean_data(data[0][:4], dqs = data[3][:4], traces = trace_fit[:4])
      9 cleaned_2 = clean_data(data[0][4:8], dqs = data[3][4:8], traces = trace_fit[4:8])
     10 

1 frames

[/content/stis_pipeline/stis_pipeline/STIS_pipeline_functions.py](https://localhost:8080/#) in clean_data(files, dq_correct, dqs, flags, difference_correct, wind_size, wind_sigma, hc_correct, hc_sigma, hc_wind_size, spline_correct, traces, spline_sigma, s, manual_badcolumn, inner_factor, outer_factor, return_marked)
    474             return
    475         else:
--> 476             marked_1 = dq_clean(files, dqs, flags)
    477             print("dq_correct complete.")
    478     else:

[/content/stis_pipeline/stis_pipeline/STIS_pipeline_functions.py](https://localhost:8080/#) in dq_clean(files, dqs, flags)
    234             bad_indices = list(zip(bad[0], bad[1]))
    235             bads.append(bad_indices)
--> 236     bads = np.array(bads)
    237 
    238     # make a copy of the original fed in files array shape

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.

I suspect there there were something wrong with the dqs arrays, so I checked the dimentions of each of the three arrays used here:

data[0]: 38
data[3]: 42
trace_fit: 38

Could this error be related to the fact that the data[3] array is longer than the other two, in my dataset?

Thanks, Andre

natalieallen commented 3 months ago

I believe the WASP-121b dataset has the same orbit breakdown as your observation, so you should be able to match the indices that I use in that analysis if you're concerned about how to split up the exposures.

For your error, I don't think that would be a problem since you're only passing in the first four files from each of those three arrays, so their shape should be the same (4x the image array), though I'm not sure why they don't match up in length. A quick google shows that this error may be a problem with numpy version, can you let me know which version of python and numpy you're using?

andreok commented 3 months ago

Hello Natalie,

Indeed, my dataset seems very similar to the WASP-17b G750L example.

Here're my Python and Numpy versions:

Python 3.10.12
numpy 1.24.0

Thanks, Andre

natalieallen commented 3 months ago

From similar cases for this error on stackoverflow, it looks like it may be caused by the numpy version (see e.g. https://stackoverflow.com/questions/67183501/setting-an-array-element-with-a-sequence-requested-array-has-an-inhomogeneous-sh) though downgrading numpy can get messy. Have you tried printing out the bads array and seeing what the inhomogeneous shape part is?

andreok commented 3 months ago

Hello Natalie,

As suggested at the bottom of the stackoverflow page you indicated, I had to do the following change to the line 236 of the STIS_pipeline_functions.py, to make the dq_clean() work:

bads = np.array(bads, dtype=np.ndarray)

Thanks, Andre

natalieallen commented 3 months ago

Ah interesting, so it does seem to be tied to numpy version stuff. Glad you have it working now!

natalieallen / stis_pipeline

Missing ")" at line 537 #1