sunpy / sunpy-soar

A sunpy plugin for accessing data in the Solar Orbiter Archive (SOAR).
https://docs.sunpy.org/projects/soar/
BSD 2-Clause "Simplified" License
16 stars 10 forks source link

Query SOAR Orbit metadata and Query by Solar Distance using AU #134

Open JonCook opened 1 month ago

JonCook commented 1 month ago

Describe the feature

Recently the solar orbiter CDF orbit file has been ingested into the SOAR archive and is available as a standard file for download. Information can be found at:

However, perhaps more interestingly the contents of the file has also been ingested into a timeseries database we have, and is available via the TAP interface.

For example to fetch hcentric distance information:

To fetch everything for vector type variables with shape 3 e.g: hci_pos, hci_vel, hee_pos:

This is the raw data contained within the orbit CDF. This opens the possibility to search for SOAR metadata directly using Solar Distance and also the possibility to download files using solar distance. Internally we derive the time intervals from the AU distances supplied.

For example: TAP Metadata request for MAG Level 2 files measured below 0.5 AU

For direct data download: TAP request for MAG Level 2 files measured below 0.5 AU (CAREFUL -returns 18/19GB tar for anonymous user)

Asynchronous versions of these requests also work e.g

It could be nice to expose some of this functionality from the soar-sunpy package

Proposed solution

Consider exposing some of this functionality via the soar-sunpy package.

ebuchlin commented 1 month ago

Thanks for opening this issue. As far as I understand, from sunpy-soar point of view this could be 2 functionalities:

Am I correct?

Also, for searching by Sun distance, can't we just search by distance_sun_obsv (from the FITS tables) instead of using doQueryFilteredByDistance? Is this column indexed? Maybe the issue is that the information is split other several tables for the different instruments, but we are already doing separate queries for other metadata involving these tables.

JonCook commented 1 month ago

Hi @ebuchlin - thanks for the comment. Sorry I could not meet you in person at the SOWG. I was connected for a while today.

Yes, that could be two possible functionalities. I guess anything else that you might want to use the data in the orbit file as well. I'm not an expert on that ;-)

For the column distance_sun_obsv - it is only present in L2+ FITS files and not all instruments have L2 files (stx) and it would require several joins across many different tables which could lead to very large and cumbersome queries. It is not indexed currently, so could also perform quite slowly. I know it can be done, but not all users are experts with ADQL.

Also this column is in meters not AU although I suppose it can be converted. Finally, can we really trust these values are accurate. As we have seen for example with SOOP information and other metadata quite often the information is incorrect and there is no way to check it easily at our end.

Using doQueryFilteredByDistance hides all this complexity and lets you search for any type/level of file by solar distance accurately, as internally it uses ranges of time derived from the orbit file. So no dependency on metadata inside the FITS (apart from start/end time which is in the filename). begin/end time is indeed indexed as well.

It is almost exactly the same endpoint as doQuery, just with the doQueryFilteredByDistance and DISTANCE parameter - so hopefully quite intuitive to use if you have already used the other one.

I hope that helps answer any doubts

Many Thanks

ebuchlin commented 1 month ago

Thanks for the explanation. Still a question: can such API endpoints be used directly with astroquery or PyVO? It seems that these packages assume REQUEST to be doQuery for TAP queries.

I am not sure I fully understand TAP 1.0 but it says "All requests to execute (/async or /sync) a query using a query language must include REQUEST=doQuery and must include the LANG parameter."

On the other hand TAP 1.1 says that "REQUEST=doQuery" is "obsolete", so I am bit lost.

Is the TAP version of the ESA service specified somewhere, and is REQUEST=doQueryFilteredByDistance allowed by one of these standards?

JonCook commented 1 month ago

Hi @ebuchlin - let me have a look into it a bit. Its a good question

JonCook commented 4 weeks ago

Hi @ebuchlin - sorry for the delay. I'm afraid I can't tell you about the subtle differences in the different versions of TAP. I know that ESA TAP services are fully TAP compliant and SOAR is implementing TAP 1.1

What I can also say is that our TAP services such as SOAR are using TAP+ which is the ESAC Space Data Centre (ESDC: https://www.cosmos.esa.int/web/esdc/) extension of the Table Access Protocol.

TAP+ is fully compatible with TAP specification. TAP+ adds more capabilities like authenticated access and persistent user storage area and in this case you could consider REQUEST=doQueryFilteredByDistance as an extension to the standard TAP.

I was playing a bit:

service = vo.dal.TAPService("https://soar.esac.esa.int/soar-sl-tap/tap")
query = service.create_query("SELECT * FROM soar.v_sc_data_item WHERE instrument='MAG' AND level='L2'")
query['REQUEST'] = 'doQueryFilteredByDistance'
query['DISTANCE'] = '(0.28,0.49)'
print(query)
print(str(query.queryurl))

result = query.execute()
print(result)

It seems possible override the REQUEST parameter. What did not work was adding a new DISTANCE parameter, but maybe one of the python gurus can help out there.

I hope that helps. Perhaps in the future we could consider making this feature available somehow via the standard doQuery metadata request

Thanks Jonathan