justincely / lightcurve_pipeline

pipeline for high level science products of HST 13902
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Determine members needed for download, not associations #8

Open bourque opened 9 years ago

bourque commented 9 years ago

Our current downloading query (pasted below) returns association names, not individual association members:

    # Build query
    query = ("SELECT sci_data_set_name "
             "FROM science "
             "WHERE (sci_instrume = 'COS' OR sci_instrume = 'STIS') "
             "AND sci_operating_mode = 'TIME-TAG' "
             "AND sci_release_date < '{0}' "
             "AND sci_targname NOT IN ('DARK', 'BIAS', 'DEUTERIUM', 'WAVE', 'ANY', 'NONE') "
             "\ngo\n".format(today))

This is problematic since there could be a situation where we are missing a member of an association in our filesystem, but would not be aware of it because our query would compare a list of association names to a list of member names. We need to find a way to generate a list of all member names that are available to download that are not already in our filesystem.

bourque commented 9 years ago

We believe the solution to this is to take the list of associations returned by the downloading query and use them to query the assoc_members table in MAST to generate a list of member names.

bourque commented 9 years ago

This seems to work! Here is an example of the type of additional query we will need to perform:

SELECT * FROM assoc_member WHERE asm_asn_id = 'lbz453010' AND asm_member_type = 'EXP-FP'
bourque commented 9 years ago

I just want to document this epic query that works:

SELECT TOP(10) asm_member_name 
FROM assoc_member 
WHERE asm_member_type = 'EXP-FP' 
AND asm_asn_id IN (SELECT sci_data_set_name 
                   FROM science 
                   WHERE (sci_instrume = 'COS' OR sci_instrume = 'STIS')
                   AND sci_operating_mode = 'TIME-TAG'
                   AND sci_targname NOT IN ('DARK', 'BIAS', 'DEUTERIUM', 'WAVE', 'ANY', 'NONE') 
                   AND sci_release_date < '2015-04-29')
bourque commented 9 years ago

This query seems to break for STIS data. For STIS data, the asm_member_type has the following possibilities:

2> SELECT DISTINCT(asm_member_type) FROM assoc_member WHERE asm_data_set_name LIKE 'O%'
3> go
asm_member_type
AUTO-WAVECAL
CRSPLIT
GO-WAVECAL
PRODUCT
REPEATOBS
SCIENCE

The PRODUCT asm_member_type appears to return associations while SCIENCE appears to return members.

Here are the possibilities for COS data:

1> SELECT DISTINCT(asm_member_type) FROM assoc_member WHERE asm_data_set_name LIKE 'L%'
2> go
asm_member_type
EXP-AWAVE
EXP-FP
EXP-GWAVE
EXP-IWAVE
PROD-FP

Here, PROD-FP appears to return associations while everything else appears to return members.

bourque commented 9 years ago

It seems that using asm_member_type = EXP-FP is the way to go for COS data.

Digging into the STIS data a bit more, it appears that all asm_member_types besides PRODUCT return members, and that PRODUCT only returns associations. This means that the STIS data that we currently have in our filesystem is a mix of PRODUCTs and other asm_member_types.

Note that there are no files with asm_member_type = CR-SPLIT under this query.

The question remains: which STIS asm_member_types are valid for our needs?

bourque commented 9 years ago

I have complied lists of rootnames for each STIS asm_member_type:

[kramer:hst/hstlc/stis_download_test] bourque% wc *.dat
    1924    1924   36556 on_filesystem.dat
    1288    1288   12880 stis_autowavecal_downloads.dat
      51      51     510 stis_gowavecal_downloads.dat
    1359    1359   13590 stis_product_downloads.dat
      16      16     160 stis_repeatobs_downloads.dat
    1343    1343   13430 stis_science_downloads.dat
    5981    5981   77126 total

From this, it is apparent that the number of REPEATOBS and SCIENCE rootnames adds up to the number of PRODUCTs.

bourque commented 9 years ago

I tried downloading the STIS dataset ocht1b020 to see if the PRODUCT or one or all of its members would be retrieved. Only the PRODUCT was received (i.e. ocht1b020_tag.fits). However, the asn table does indeed list the individual members:

--> tprint ocht1b020_asn.fits
#  Table ocht1b020_asn.fits[1]  Fri 09:30:25 08-May-2015

# row MEMNAME        MEMTYPE        MEMPRSNT
#

    1 OCHT1BO6Q      AUTO-WAVECAL   yes
    2 OCHT1BOBQ      SCIENCE        yes
    3 OCHT1BOFQ      AUTO-WAVECAL   yes
    4 OCHT1B020      PRODUCT        yes

I'm not sure why these didn't come down with the rest of the downloads.

bourque commented 9 years ago

The initial query that @justincely used to download datasets (the datasets that currently reside in ingest) apparently retrieved observations that were PRODUCTS in the assoc_member table in MAST as well as individual members that are not in the assoc_member at all. This was determined by comparing datasets generated by MAST queries for varying asm_member_types and the dataset that exists in the ingest directory. See the dataset_in_ingest notebook for details.

The problem to now solve is to somehow get the datasets that exists independently and that are not part of any association (and thus do not exist in the assoc_member table. For the data that are part of an association, the asm_member_type = PRODUCT query should suffice.

bourque commented 8 years ago

@justincely came up with a downloading query that seems to work well:

"SELECT assoc_member.asm_member_name,science.sci_data_set_name  "
             "FROM assoc_member "
             "JOIN science "
             "ON assoc_member.asm_asn_id = science.sci_data_set_name "
             "WHERE (science.sci_instrume = 'COS' OR science.sci_instrume = 'STIS') "
             "AND science.sci_operating_mode = 'TIME-TAG' "
             "AND science.sci_release_date < '{0}' "
             "AND science.sci_targname NOT IN ('DARK', 'BIAS', 'DEUTERIUM', 'WAVE', 'ANY', 'NONE') "
             "AND assoc_member.asm_member_type != 'PROD-FP' "
             "AND assoc_member.asm_member_type != 'PRODUCT' "
             "AND assoc_member.asm_member_type != 'EXP-GWAVE' "
             "AND assoc_member.asm_member_type != 'EXP-IWAVE' "
             "AND assoc_member.asm_member_type != 'EXP-AWAVE' "
             "AND assoc_member.asm_member_type != 'GO-WAVECAL' "
             "AND assoc_member.asm_member_type != 'AUTO-WAVECAL' "