I generated STAC items for every unique state/year combination available in the NAIP bucket. This was a total of 228 runs.
Issues I found during testing:
1) If it can't find the resource description and date in the metadata files, the code attempts to extract it from the COG href. In order to get the date from the COG href, it uses a regex. Most of the COG's have a name with the format m_3510264_ne_13_060_20200905.tif , but some of them have it as m_4209601_ne_14_060_20180912_20181211.tif with an extra 8-number sequence at the end of the name. In these cases, the actual date for the scene is always the first set of 8 characters (in this case, 20180912) and not the second. The regex was modified to take this into account with an optional clause at the end.
2) Some of the XML metadata files from the year 2020 do not contain the xpath gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:title/gco:CharacterString but instead contain the xpath idinfo/citation/citeinfo/title for the resource description field. This was added as a fallback if the longer xpath was not found. Most of the metadata files contain the longer xpath and only a handful contain the shorter one.
3) The logic for extracting the resource description and date for scenes prior to 2020, if no resource description and date are found by looking through the associated metadata file, was made to use the same common method (maybe_extract_id_and_date) that the other cases use.
I generated STAC items for every unique state/year combination available in the NAIP bucket. This was a total of 228 runs.
Issues I found during testing:
1) If it can't find the resource description and date in the metadata files, the code attempts to extract it from the COG href. In order to get the date from the COG href, it uses a regex. Most of the COG's have a name with the format
m_3510264_ne_13_060_20200905.tif
, but some of them have it asm_4209601_ne_14_060_20180912_20181211.tif
with an extra 8-number sequence at the end of the name. In these cases, the actual date for the scene is always the first set of 8 characters (in this case,20180912
) and not the second. The regex was modified to take this into account with an optional clause at the end.2) Some of the XML metadata files from the year 2020 do not contain the xpath
gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:title/gco:CharacterString
but instead contain the xpathidinfo/citation/citeinfo/title
for the resource description field. This was added as a fallback if the longer xpath was not found. Most of the metadata files contain the longer xpath and only a handful contain the shorter one.3) The logic for extracting the resource description and date for scenes prior to 2020, if no resource description and date are found by looking through the associated metadata file, was made to use the same common method (
maybe_extract_id_and_date
) that the other cases use.