solo-spice / sospice

Python data analysis tools for the SPICE extreme-UV spectrometer on Solar Orbiter
BSD 3-Clause "New" or "Revised" License
12 stars 5 forks source link

Add `mid_time()` to FileMetadata #53

Closed ebuchlin closed 2 months ago

ebuchlin commented 3 months ago

Catalog.mid_time() computes the mid-time of the set all observations in Catalog. There could be an equivalent function at FileMetadata level. However, for performance issues, at least for method='midrange' the code in Catalog.mid_time() should probably not be a wrapper on FileMetadata.mid_time().

safimuhammad commented 3 months ago

Hello, I would like to work on this issue.

ebuchlin commented 3 months ago

Hello, you can start working on this issue if you like, by implementing a mid_time() method to FileMetadata; there is no need to change Catalog at the moment, we will see that later. Please start from the current develop branch and follow the contribution guide (which might be not clear enough on some points).

safimuhammad commented 3 months ago

Alright, I'll start on this, will reach you out if incase i need any help.

safimuhammad commented 3 months ago

@ebuchlin need some clarification on this, FileMetaData requires pd.Series which could be a series of file's metadata (path or url), In order to get a mid_time of it, we'd need a pd.dataframe which is obtained by catalog.csv to calculate out of set of dates, this is already done by Catalog.mid_time() , how exactly do we need to implement it on FileMetaData level ? considering if it is dealing with multiple files in pd.Series, Should we extract dates and Telapse out the Series of file metadatas in series ?

Am I going in the right direction? ps: just starting out with this project, need some guidance.

ebuchlin commented 3 months ago

In the case of FileMetadata object fm, the metadata fm.metadata (which are a pandas.Series object) are about one SPICE observation (the metadata are the different parameters for this observation). Such observation has a duration of TELAPSE seconds. If fm is a FileMetadata object, the duration is from fm.metadata['DATE-BEG'] to fm.metadata['DATE-BEG'] + pd.Timedelta(seconds=fm.metadata['TELAPSE']). fm.mid_time() would then simply be fm.metadata['DATE-BEG'] + pd.Timedelta(seconds=fm.metadata['TELAPSE'] / 2).

safimuhammad commented 3 months ago

Thanks for the clarification.

safimuhammad commented 3 months ago

Also, does that mean we don't need the options of 'mean' and 'barycenter' methods in fm.mid_time() since we're dealing with one file.

ebuchlin commented 3 months ago

Indeed, it wasn't clear to me when I first wrote the issue, but for FileMetadata only one method (the formula I gave in my comment above) is needed.

Then we can include it in tests, and determine whether the mean and barycenter methods for Catalog.mid_time() should be rewritten to call FileMetadata.mid_time() or not (the choice could depend on performance).

safimuhammad commented 3 months ago

Thanks Already implemented this, need to know about the tests, should they follow the same convention as the Catalog mid_time() test? The test files will be different for the TestFileMetaData?

safimuhammad commented 3 months ago

if so then i believe this expression should also be updated based on the file in FileMetaData. assert ( abs(cat.mid_time() - pd.Timestamp("2022-04-12 08:26:46")).total_seconds() < 1 # noqa: W503 )

ebuchlin commented 3 months ago

You can include the test in the TestFileMetadata class in catalogs/tests/test_file_metadata.py and use the file_metadata fixture, and get its mid_time() using your new function. At the moment this seems to me to be independent from tests for Catalog.

safimuhammad commented 3 months ago

pd.Timestamp("2022-04-12 08:26:46")).total_seconds() should I change the timestamps on this to the newer one as its failing at this.

ebuchlin commented 3 months ago

Yes of course, the value that is compared to the returned value should be adapted to the actual object used for the test.