Open bourque opened 9 years ago
We believe the solution to this is to take the list of associations returned by the downloading query and use them to query the assoc_members
table in MAST to generate a list of member names.
This seems to work! Here is an example of the type of additional query we will need to perform:
SELECT * FROM assoc_member WHERE asm_asn_id = 'lbz453010' AND asm_member_type = 'EXP-FP'
I just want to document this epic query that works:
SELECT TOP(10) asm_member_name
FROM assoc_member
WHERE asm_member_type = 'EXP-FP'
AND asm_asn_id IN (SELECT sci_data_set_name
FROM science
WHERE (sci_instrume = 'COS' OR sci_instrume = 'STIS')
AND sci_operating_mode = 'TIME-TAG'
AND sci_targname NOT IN ('DARK', 'BIAS', 'DEUTERIUM', 'WAVE', 'ANY', 'NONE')
AND sci_release_date < '2015-04-29')
This query seems to break for STIS data. For STIS data, the asm_member_type
has the following possibilities:
2> SELECT DISTINCT(asm_member_type) FROM assoc_member WHERE asm_data_set_name LIKE 'O%'
3> go
asm_member_type
AUTO-WAVECAL
CRSPLIT
GO-WAVECAL
PRODUCT
REPEATOBS
SCIENCE
The PRODUCT
asm_member_type appears to return associations while SCIENCE
appears to return members.
Here are the possibilities for COS data:
1> SELECT DISTINCT(asm_member_type) FROM assoc_member WHERE asm_data_set_name LIKE 'L%'
2> go
asm_member_type
EXP-AWAVE
EXP-FP
EXP-GWAVE
EXP-IWAVE
PROD-FP
Here, PROD-FP
appears to return associations while everything else appears to return members.
It seems that using asm_member_type = EXP-FP
is the way to go for COS data.
Digging into the STIS data a bit more, it appears that all asm_member_type
s besides PRODUCT
return members, and that PRODUCT
only returns associations. This means that the STIS data that we currently have in our filesystem is a mix of PRODUCT
s and other asm_member_type
s.
Note that there are no files with asm_member_type = CR-SPLIT
under this query.
The question remains: which STIS asm_member_type
s are valid for our needs?
I have complied lists of rootnames for each STIS asm_member_type
:
[kramer:hst/hstlc/stis_download_test] bourque% wc *.dat
1924 1924 36556 on_filesystem.dat
1288 1288 12880 stis_autowavecal_downloads.dat
51 51 510 stis_gowavecal_downloads.dat
1359 1359 13590 stis_product_downloads.dat
16 16 160 stis_repeatobs_downloads.dat
1343 1343 13430 stis_science_downloads.dat
5981 5981 77126 total
From this, it is apparent that the number of REPEATOBS
and SCIENCE
rootnames adds up to the number of PRODUCT
s.
I tried downloading the STIS dataset ocht1b020
to see if the PRODUCT
or one or all of its members would be retrieved. Only the PRODUCT was received (i.e. ocht1b020_tag.fits
). However, the asn table does indeed list the individual members:
--> tprint ocht1b020_asn.fits
# Table ocht1b020_asn.fits[1] Fri 09:30:25 08-May-2015
# row MEMNAME MEMTYPE MEMPRSNT
#
1 OCHT1BO6Q AUTO-WAVECAL yes
2 OCHT1BOBQ SCIENCE yes
3 OCHT1BOFQ AUTO-WAVECAL yes
4 OCHT1B020 PRODUCT yes
I'm not sure why these didn't come down with the rest of the downloads.
The initial query that @justincely used to download datasets (the datasets that currently reside in ingest
) apparently retrieved observations that were PRODUCTS
in the assoc_member
table in MAST as well as individual members that are not in the assoc_member
at all. This was determined by comparing datasets generated by MAST queries for varying asm_member_type
s and the dataset that exists in the ingest
directory. See the dataset_in_ingest notebook for details.
The problem to now solve is to somehow get the datasets that exists independently and that are not part of any association (and thus do not exist in the assoc_member
table. For the data that are part of an association, the asm_member_type = PRODUCT
query should suffice.
@justincely came up with a downloading query that seems to work well:
"SELECT assoc_member.asm_member_name,science.sci_data_set_name "
"FROM assoc_member "
"JOIN science "
"ON assoc_member.asm_asn_id = science.sci_data_set_name "
"WHERE (science.sci_instrume = 'COS' OR science.sci_instrume = 'STIS') "
"AND science.sci_operating_mode = 'TIME-TAG' "
"AND science.sci_release_date < '{0}' "
"AND science.sci_targname NOT IN ('DARK', 'BIAS', 'DEUTERIUM', 'WAVE', 'ANY', 'NONE') "
"AND assoc_member.asm_member_type != 'PROD-FP' "
"AND assoc_member.asm_member_type != 'PRODUCT' "
"AND assoc_member.asm_member_type != 'EXP-GWAVE' "
"AND assoc_member.asm_member_type != 'EXP-IWAVE' "
"AND assoc_member.asm_member_type != 'EXP-AWAVE' "
"AND assoc_member.asm_member_type != 'GO-WAVECAL' "
"AND assoc_member.asm_member_type != 'AUTO-WAVECAL' "
Our current downloading query (pasted below) returns association names, not individual association members:
This is problematic since there could be a situation where we are missing a member of an association in our filesystem, but would not be aware of it because our query would compare a list of association names to a list of member names. We need to find a way to generate a list of all member names that are available to download that are not already in our filesystem.