vascobnunes / fetchLandsatSentinelFromGoogleCloud

Find and download Landsat and Sentinel-2 data from the public Google Cloud
https://vascobnunes.github.io/fetchLandsatSentinelFromGoogleCloud/
MIT License
52 stars 22 forks source link

Sentinel 2: directories predating the short filename change should be renamed #37

Closed GreatEmerald closed 6 years ago

GreatEmerald commented 7 years ago

Google Cloud stores all of the data in directories with the short format, as specified by ESA here: https://earth.esa.int/web/sentinel/user-guides/sentinel-2-msi/naming-convention However, as the document says, this should only apply to products generated after 2016-12-06, whereas on Google Cloud all directories have these names. This means that as it is, sen2cor refuses to process the data, giving no output whatsoever. My testcase was:

python fetchFromGoogleCloud.py 21NUE S2 2015-12-20 2015-12-24

You can tell that the internal format is still the old one, because inside it there are files S2A_OPER_BWI_MSIL1C_PDMC_20151223T201036_R053_V20151223T143139_20151223T143139.png and S2A_OPER_MTD_SAFL1C_PDMC_20151223T201036_R053_V20151223T143139_20151223T143139.xml which are of the long format.

The actual name that the directory should have is S2A_OPER_PRD_MSIL1C_PDMC_20151223T201036_R053_V20151223T143139_20151223T143139.SAFE (verified by downloading this tile from the Sentinel hub). This is almost the same as the filename of the PNG file, but not quite: there is PRD instead of BWI. sen2cor will refuse to process it if it is BWI saying:

L1C user product directory must match the following mask: S2A_MSIL1C*.SAFE

(which is actually misleading, because it does accept S2A_OPER_PRD_MSIL1C_PDMC_20151223T201036_R053_V20151223T143139_20151223T143139.SAFE fine).

Looks like the correct name can be parsed from the file INSPIRE.xml:

                    <gmd:title>
                        <gco:CharacterString>S2A_OPER_PRD_MSIL1C_PDMC_20151223T201036_R053_V20151223T143139_20151223T143139.SAFE</gco:CharacterString>
                    </gmd:title>
vascobnunes commented 7 years ago

Thank you. If you change the name manually can you use sen2cor?

GreatEmerald commented 7 years ago

Yes.

GreatEmerald commented 7 years ago

Just tried with another four products, two granules each (all predating the switch to the compact notation), and indeed the same applies to those. I renamed the .SAFE directories to what is referenced in INSPIRE.xml, manually downloaded the ozone data as per issue #36 for the tiles and ran sen2cor, which finished processing with no errors or warnings.

One thing to consider, also, is that the names of the directories are the same when downloading multiple granules that belong to one product. All of the top-level files are the same, the only difference is in the data inside the relevant GRANULE directory. That makes sense, since the Sentinel Hub data actually comes with all the granules in one product, whereas Google stores individual granules only. So the data from these need to be merged after download, or only the files inside the GRANULE directory downloaded. The former is probably easier, as the OS should be able to merge the directories itself (but watch out whether something needs to be done to tell it to overwrite or not to overwrite files).

vascobnunes commented 6 years ago

What changes do you suggest we do to deal with the issue you mention?

GreatEmerald commented 6 years ago

Parse INSPIRE.xml for the gmd:title tag, and rename the top-level downloaded directory to the value of gco:CharacterString. For new files that will not matter as the name of the .SAFE directory matches the metadata, but it does matter for the old style ones (until they get reprocessed to also use the new notation).

vascobnunes commented 6 years ago

Ok, this is now implemented and commited. Hope this helps