sunpy / sunpy-soar

A sunpy plugin for accessing data in the Solar Orbiter Archive (SOAR).
https://docs.sunpy.org/projects/soar/
BSD 2-Clause "Simplified" License
18 stars 12 forks source link

Use astroquery.utils.tap rather than making our own URL calls #32

Open Cadair opened 2 years ago

Cadair commented 2 years ago

Provide a general description of the issue or problem.

https://astroquery.readthedocs.io/en/latest/utils/tap.html

dstansby commented 2 years ago

I just had a quick look into this, and the following code errors, and I'm not sure what the solution is. Thought I'd leave this here as a paper trail though.

from astroquery.utils.tap.core import TapPlus

soar = TapPlus('http://soar.esac.esa.int/soar-sl-tap/tap')
tables = soar.load_tables(only_names=True)
for table in (tables):
    print(table.get_qualified_name())

table = soar.load_table('time_series.time_series.v_solo_ll02_swapasmom_2')
print(table)

with

Retrieving table 'time_series.time_series.v_solo_ll02_swapasmom_2'
500 Error 500:
esavo.tap.TAPException: esavo.tap.TAPException: Schema cannot be null
Traceback (most recent call last):
  File "/Users/dstansby/github/sunpy-soar/test.py", line 8, in <module>
    table = soar.load_table('time_series.time_series.v_solo_ll02_swapasmom_2')
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/astroquery/utils/tap/core.py", line 177, in load_table
    self.__connHandler.check_launch_response_status(response,
  File "/Users/dstansby/mambaforge/envs/sunpy/lib/python3.10/site-packages/astroquery/utils/tap/conn/tapconn.py", line 683, in check_launch_response_status
    raise requests.exceptions.HTTPError(errMsg)
requests.exceptions.HTTPError: Error 500:
esavo.tap.TAPException: esavo.tap.TAPException: Schema cannot be null
Cadair commented 2 years ago

I asked Helen and she said to drop the part before the first . which apparently is the schema.

Cadair commented 2 years ago
>>> table = soar.load_table('time_series.v_solo_ll02_swapasmom_2')
>>> print(table)
Retrieving table 'time_series.v_solo_ll02_swapasmom_2'
TAP Table name: time_series.time_series.v_solo_ll02_swapasmom_2
Description: 
Num. columns: 3
herroyalmaj commented 2 years ago

Hey there :) herroyalmaj

Cadair commented 2 years ago

I have had some success:

In [51]: job = soar.launch_job(f"SELECT * FROM v_sc_data_item WHERE (descriptor='{descriptor}') AND (begin_time>='{cme_start}') AND (end_time<='{cme_end}')")

In [52]: job.results
Out[52]: 
<Table length=36>
       begin_time                       data_item_id                data_item_oid data_type    descriptor    ... postcard_item_oid        prop_end       sc_begin_time sc_end_time sensor
         object                            object                       int64       object       object      ...       int64               object            str10        str10    object
----------------------- ------------------------------------------- ------------- --------- ---------------- ... ----------------- --------------------- ------------- ----------- ------
2022-03-28T13:50:15.226 solo_L2_eui-fsi304-image_20220328T135015226        460311       SCI EUI-FSI304-IMAGE ...            224221 1900-01-01T00:00:00.0                                 
2022-03-28T13:40:15.226 solo_L2_eui-fsi304-image_20220328T134015226        460310       SCI EUI-FSI304-IMAGE ...            224220 1900-01-01T00:00:00.0                                 
2022-03-28T13:30:15.225 solo_L2_eui-fsi304-image_20220328T133015225        460309       SCI EUI-FSI304-IMAGE ...            224219 1900-01-01T00:00:00.0                                 
2022-03-28T13:20:15.224 solo_L2_eui-fsi304-image_20220328T132015224        460308       SCI EUI-FSI304-IMAGE ...            224218 1900-01-01T00:00:00.0                                 
2022-03-28T13:10:15.222 solo_L2_eui-fsi304-image_20220328T131015222        460307       SCI EUI-FSI304-IMAGE ...            224216 1900-01-01T00:00:00.0                                 
2022-03-28T13:00:15.219 solo_L2_eui-fsi304-image_20220328T130015219        460306       SCI EUI-FSI304-IMAGE ...            224217 1900-01-01T00:00:00.0                                 
 2022-03-28T12:50:15.22 solo_L2_eui-fsi304-image_20220328T125015220        460305       SCI EUI-FSI304-IMAGE ...            224215 1900-01-01T00:00:00.0                                 
 2022-03-28T12:40:15.22 solo_L2_eui-fsi304-image_20220328T124015220        460304       SCI EUI-FSI304-IMAGE ...            224211 1900-01-01T00:00:00.0                                 
2022-03-28T12:30:15.218 solo_L2_eui-fsi304-image_20220328T123015218        460303       SCI EUI-FSI304-IMAGE ...            224213 1900-01-01T00:00:00.0                                 
                    ...                                         ...           ...       ...              ... ...               ...                   ...           ...         ...    ...
2022-03-28T12:20:15.217 solo_L1_eui-fsi304-image_20220328T122015217        329158       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
2022-03-28T12:10:15.216 solo_L1_eui-fsi304-image_20220328T121015216        329157       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
2022-03-28T12:00:15.213 solo_L1_eui-fsi304-image_20220328T120015213        329156       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
2022-03-28T11:50:15.214 solo_L1_eui-fsi304-image_20220328T115015214        329155       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
2022-03-28T11:40:15.213 solo_L1_eui-fsi304-image_20220328T114015213        329154       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
2022-03-28T11:30:15.212 solo_L1_eui-fsi304-image_20220328T113015212        329153       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
2022-03-28T11:20:15.209 solo_L1_eui-fsi304-image_20220328T112015209        329152       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
 2022-03-28T11:10:15.21 solo_L1_eui-fsi304-image_20220328T111015210        329151       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
2022-03-28T11:00:15.209 solo_L1_eui-fsi304-image_20220328T110015209        329150       SCI EUI-FSI304-IMAGE ...                -- 1900-01-01T00:00:00.0                                 
Cadair commented 2 years ago

OK, so I think this has some advantages, mainly that constructing a SQL like query is nicer than url parameters and that it automatically makes us an astropy table (which we can clean up a little using the built in query response table stuff).

I think that if I was going to fall all the way down this pit, I would rewrite the attr walker to return some kind of structured thing representing the parts of the WHERE like (colname, operator, value) or something and then do the formatting in the search method.

Also, I think this could end up being one of the few clients where we can pass the ORs up to the web service which would be a fun exercise in breaking Fido.

Cadair commented 2 years ago

I also found this: https://pypika.readthedocs.io/en/latest/ which might be useful to stop us having to write a lot of SQL by string formatting, it might be a bit overkill though.

ebuchlin commented 2 years ago

Maybe irrelevant now that use of astroquery.utils.tap is considered: a first step (with no additional dependency) could be to build the ADQL query without building the full URL explicitly (as currently done in _construct_url()), then use something like

        tap_end_point = 'http://soar.esac.esa.int/soar-sl-tap/tap/'
        payload = {
            'REQUEST': 'doQuery',
            'LANG': 'ADQL',
            'FORMAT': 'json',
            'QUERY': adql_query
        }
        r = requests.get(f'{tap_end_point}/sync', params=payload)

This is more clear and maintainable, the ADQL query is automatically URL-escaped, and you can still get the actual full URL with r.url.

ebuchlin commented 3 months ago

I just learned from @herroyalmaj that TapPlus now recommends to use PyVO instead.

I would say that it doesn't change much to PR #129, as most of line changes in that PR were to make the ADQL queries non-URL-escaped. The transition to PyVO should then not be too difficult as a next step.

nabobalis commented 3 months ago

In that case we should update the current PR to use that.