stac-utils / stactools

Command line utility and Python library for STAC
https://stactools.readthedocs.io/
Other
104 stars 28 forks source link

stac copy with SELF_CONTAINED still has absolute links for assets, redux #124

Closed cholmes closed 3 years ago

cholmes commented 3 years ago

I seem to have a regression, using the pystac-v1.0 branch. I try a copy with -t SELF_CONTAINED and I don't get relative links for assets. But if I use a previously installed stactools - not sure which version (see #123), but it produces 1.0.0-beta.2 stac_version - I get the proper relative asset links.

The command I'm using is: stac copy -t SELF_CONTAINED planet-stacs/peru-an/collection.json skysat-sample

With pystac-v1.0 branch:

...

    "assets": {
        "visual:ortho_visual": {
            "href": "/Users/cholmes/Repos/planet-orders/planet-stacs/peru/20201230_151832_ssc13_u0001/20201230_151832_ssc13_u0001_visual.tif",
            "type": "image/tiff; application=geotiff; profile=cloud-optimized"
        },
        "metadata": {
            "href": "/Users/cholmes/Repos/planet-orders/planet-stacs/peru/20201230_151832_ssc13_u0001/20201230_151832_ssc13_u0001_metadata.json",
            "type": "application/json"
        }
    },

with installed one:

    "assets": {
        "analytic:ortho_analytic": {
            "href": "../../Repos/planet-orders/planet-stacs/peru-an/20201230_151906_ssc13_u0001/20201230_151906_ssc13_u0001_analytic_file_format.tif",
            "type": "image/tiff; application=geotiff; profile=cloud-optimized"
        },
        "analytic:ortho_analytic_udm": {
            "href": "../../Repos/planet-orders/planet-stacs/peru-an/20201230_151906_ssc13_u0001/20201230_151906_ssc13_u0001_analytic_udm.tif",
            "type": "image/tiff; application=geotiff; profile=cloud-optimized"
        },
        "metadata": {
            "href": "../../Repos/planet-orders/planet-stacs/peru-an/20201230_151906_ssc13_u0001/20201230_151906_ssc13_u0001_metadata.json",
            "type": "application/json"
        }
    },

It seems to be about the same as #31, so maybe it's some sort of regression?

volaya commented 3 years ago

I tried with the feature/pystac-v1.0 branch and the current main branch of PySTAC, and I am not able to reproduce it. I get correct relative lins. Maybe it is something related to your particular catalog? If you can share it (at least the json files), that would help.

cholmes commented 3 years ago

Cool, thanks for trying to reproduce - I thought it could just be me or something weird. I'll dig in again with a more 'pure' start to see if that helps, and if I still have issues I'll send you stuff.

gadomski commented 3 years ago

@cholmes did the fresh install we were working on this week fix this issue?

cholmes commented 3 years ago

@gadomski - unfortunately it looks like it didn't.

...

Just spent close to an hour trying to come up with easily reproducible test cases. It happens every time on mine, but it's weirdly hard to get things set up without much data to see it easily. I think I've got a couple now though. I'm using the latest release out of the box, with the planet plugin from source I believe (the call you gave me Pete).

So it seems to not work 'either way' for me. If I have an absolute link in the asset then it doesn't convert to a relative one, and if I have a relative one it doesn't convert to an asbolute.

The second way you can try with this catalog https://storage.googleapis.com/open-cogs/test-stac/catalog.json - do a copy with moving assets and then you should have it local. Then do a copy with -t ABSOLUTE_PUBLISHED, and for me it stays relative.

Then you can hand enter your absolute URL on your file system (all the non-asset links all update fine, can just use it from there), and try it with -t SELF_CONTAINED

I also put up a Planet order to try out, and I think it should show the same problem. Download https://storage.googleapis.com/open-cogs/PlanetScope_Disaster_Data_Scene_psscene4band_analytic_sr.zip and unzip it, and then do a convert-order: stac planet convert-order ~/Downloads/PlanetScope_Disaster_Data_Scene_psscene4band_analytic_sr/manifest.json test1 disaster "cool stuff" You'll get absolute links, and then a: stac copy -t SELF_CONTAINED test1/collection.json test2 for me stays as all absolute links in the assets.

gadomski commented 3 years ago

@cholmes I cannot reproduce. I haven't tried the Planet data, but using https://storage.googleapis.com/open-cogs/test-stac/catalog.json I tested with this script: https://gist.github.com/gadomski/0b5a81af88e08627ace2c54d9cb99bb8. Results below, and tl;dr all three types appear to be correct to my eye. To make sure we're playing in the same sandbox, can you provide your environment by running the following:

pip install pipdeptree
pipdeptree --packages stactools

and pasting the output here?

Script output

+ stac copy https://storage.googleapis.com/open-cogs/test-stac/catalog.json test-stac
+ stac copy -t ABSOLUTE_PUBLISHED test-stac/catalog.json test-stac-absolute-published
+ print_links test-stac-absolute-published
+ jq '.links[]' test-stac-absolute-published/catalog.json
{
  "rel": "root",
  "href": "/Users/gadomski/Desktop/test-stac-absolute-published/catalog.json",
  "type": "application/json"
}
{
  "rel": "item",
  "href": "/Users/gadomski/Desktop/test-stac-absolute-published/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json",
  "type": "application/json"
}
{
  "rel": "self",
  "href": "/Users/gadomski/Desktop/test-stac-absolute-published/catalog.json",
  "type": "application/json"
}
+ jq '.links[]' test-stac-absolute-published/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json
{
  "rel": "root",
  "href": "/Users/gadomski/Desktop/test-stac-absolute-published/catalog.json",
  "type": "application/json"
}
{
  "rel": "parent",
  "href": "/Users/gadomski/Desktop/test-stac-absolute-published/catalog.json",
  "type": "application/json"
}
{
  "rel": "self",
  "href": "/Users/gadomski/Desktop/test-stac-absolute-published/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json",
  "type": "application/json"
}
+ stac copy -t RELATIVE_PUBLISHED test-stac/catalog.json test-stac-relative-published
+ print_links test-stac-relative-published
+ jq '.links[]' test-stac-relative-published/catalog.json
{
  "rel": "root",
  "href": "./catalog.json",
  "type": "application/json"
}
{
  "rel": "item",
  "href": "./20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json",
  "type": "application/json"
}
{
  "rel": "self",
  "href": "/Users/gadomski/Desktop/test-stac-relative-published/catalog.json",
  "type": "application/json"
}
+ jq '.links[]' test-stac-relative-published/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json
{
  "rel": "root",
  "href": "../catalog.json",
  "type": "application/json"
}
{
  "rel": "parent",
  "href": "../catalog.json",
  "type": "application/json"
}
+ stac copy -t SELF_CONTAINED test-stac/catalog.json test-stac-self-contained
+ print_links test-stac-self-contained
+ jq '.links[]' test-stac-self-contained/catalog.json
{
  "rel": "root",
  "href": "./catalog.json",
  "type": "application/json"
}
{
  "rel": "item",
  "href": "./20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json",
  "type": "application/json"
}
+ jq '.links[]' test-stac-self-contained/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json
{
  "rel": "root",
  "href": "../catalog.json",
  "type": "application/json"
}
{
  "rel": "parent",
  "href": "../catalog.json",
  "type": "application/json"
}

Environment

My environment looks like this (via pipdeptree --packages stactools):

stactools==0.2.1a1
  - aiohttp [required: ~=3.7, installed: 3.7.4.post0]
    - async-timeout [required: >=3.0,<4.0, installed: 3.0.1]
    - attrs [required: >=17.3.0, installed: 21.2.0]
    - chardet [required: >=2.0,<5.0, installed: 4.0.0]
    - multidict [required: >=4.5,<7.0, installed: 5.1.0]
    - typing-extensions [required: >=3.6.5, installed: 3.10.0.0]
    - yarl [required: >=1.0,<2.0, installed: 1.6.3]
      - idna [required: >=2.0, installed: 2.10]
      - multidict [required: >=4.0, installed: 5.1.0]
  - click [required: ~=7.1, installed: 7.1.2]
  - fsspec [required: ~=2021.6.0, installed: 2021.6.0]
  - lxml [required: ~=4.6, installed: 4.6.3]
  - pyproj [required: ~=3.0, installed: 3.1.0]
    - certifi [required: Any, installed: 2021.5.30]
  - pystac [required: ~=1.0.0rc1, installed: 1.0.0rc1]
    - python-dateutil [required: >=2.7.0, installed: 2.8.1]
      - six [required: >=1.5, installed: 1.16.0]
  - rasterio [required: ~=1.2, installed: 1.2.4]
    - affine [required: Any, installed: 2.3.0]
    - attrs [required: Any, installed: 21.2.0]
    - certifi [required: Any, installed: 2021.5.30]
    - click [required: >=4.0, installed: 7.1.2]
    - click-plugins [required: Any, installed: 1.1.1]
      - click [required: >=4.0, installed: 7.1.2]
    - cligj [required: >=0.5, installed: 0.7.2]
      - click [required: >=4.0, installed: 7.1.2]
    - numpy [required: Any, installed: 1.20.3]
    - setuptools [required: Any, installed: 56.2.0]
    - snuggs [required: >=1.4.1, installed: 1.4.7]
      - numpy [required: Any, installed: 1.20.3]
      - pyparsing [required: >=2.1.6, installed: 2.4.7]
  - requests [required: ~=2.25, installed: 2.25.1]
    - certifi [required: >=2017.4.17, installed: 2021.5.30]
    - chardet [required: >=3.0.2,<5, installed: 4.0.0]
    - idna [required: >=2.5,<3, installed: 2.10]
    - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.5]
  - Shapely [required: ~=1.7, installed: 1.7.1]
cholmes commented 3 years ago

It's just the assets that are off. So you need to look at the item, as the catalog doesn't have asset links.

On Tue, Jun 22, 2021, 5:28 AM Pete Gadomski @.***> wrote:

@cholmes https://github.com/cholmes I cannot reproduce. I haven't tried the Planet data, but using https://storage.googleapis.com/open-cogs/test-stac/catalog.json I tested with this script: https://gist.github.com/gadomski/0b5a81af88e08627ace2c54d9cb99bb8. Results below, and tl;dr all three types appear to be correct to my eye. To make sure we're playing in the same sandbox, can you provide your environment by running the following:

pip install pipdeptree pipdeptree --packages stactools

and pasting the output here? Script output

  • stac copy https://storage.googleapis.com/open-cogs/test-stac/catalog.json test-stac
  • stac copy -t ABSOLUTE_PUBLISHED test-stac/catalog.json test-stac-absolute-published
  • print_links test-stac-absolute-published
  • jq '.links[]' test-stac-absolute-published/catalog.json { "rel": "root", "href": "/Users/gadomski/Desktop/test-stac-absolute-published/catalog.json", "type": "application/json" } { "rel": "item", "href": "/Users/gadomski/Desktop/test-stac-absolute-published/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json", "type": "application/json" } { "rel": "self", "href": "/Users/gadomski/Desktop/test-stac-absolute-published/catalog.json", "type": "application/json" }
  • jq '.links[]' test-stac-absolute-published/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json { "rel": "root", "href": "/Users/gadomski/Desktop/test-stac-absolute-published/catalog.json", "type": "application/json" } { "rel": "parent", "href": "/Users/gadomski/Desktop/test-stac-absolute-published/catalog.json", "type": "application/json" } { "rel": "self", "href": "/Users/gadomski/Desktop/test-stac-absolute-published/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json", "type": "application/json" }
  • stac copy -t RELATIVE_PUBLISHED test-stac/catalog.json test-stac-relative-published
  • print_links test-stac-relative-published
  • jq '.links[]' test-stac-relative-published/catalog.json { "rel": "root", "href": "./catalog.json", "type": "application/json" } { "rel": "item", "href": "./20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json", "type": "application/json" } { "rel": "self", "href": "/Users/gadomski/Desktop/test-stac-relative-published/catalog.json", "type": "application/json" }
  • jq '.links[]' test-stac-relative-published/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json { "rel": "root", "href": "../catalog.json", "type": "application/json" } { "rel": "parent", "href": "../catalog.json", "type": "application/json" }
  • stac copy -t SELF_CONTAINED test-stac/catalog.json test-stac-self-contained
  • print_links test-stac-self-contained
  • jq '.links[]' test-stac-self-contained/catalog.json { "rel": "root", "href": "./catalog.json", "type": "application/json" } { "rel": "item", "href": "./20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json", "type": "application/json" }
  • jq '.links[]' test-stac-self-contained/20170831_172754_101c_3b_Visual/20170831_172754_101c_3b_Visual.json { "rel": "root", "href": "../catalog.json", "type": "application/json" } { "rel": "parent", "href": "../catalog.json", "type": "application/json" }

Environment

My environment looks like this (via pipdeptree --packages stactools):

stactools==0.2.1a1

  • aiohttp [required: ~=3.7, installed: 3.7.4.post0]
    • async-timeout [required: >=3.0,<4.0, installed: 3.0.1]
    • attrs [required: >=17.3.0, installed: 21.2.0]
    • chardet [required: >=2.0,<5.0, installed: 4.0.0]
    • multidict [required: >=4.5,<7.0, installed: 5.1.0]
    • typing-extensions [required: >=3.6.5, installed: 3.10.0.0]
    • yarl [required: >=1.0,<2.0, installed: 1.6.3]
      • idna [required: >=2.0, installed: 2.10]
      • multidict [required: >=4.0, installed: 5.1.0]
  • click [required: ~=7.1, installed: 7.1.2]
  • fsspec [required: ~=2021.6.0, installed: 2021.6.0]
  • lxml [required: ~=4.6, installed: 4.6.3]
  • pyproj [required: ~=3.0, installed: 3.1.0]
    • certifi [required: Any, installed: 2021.5.30]
  • pystac [required: ~=1.0.0rc1, installed: 1.0.0rc1]
    • python-dateutil [required: >=2.7.0, installed: 2.8.1]
      • six [required: >=1.5, installed: 1.16.0]
  • rasterio [required: ~=1.2, installed: 1.2.4]
    • affine [required: Any, installed: 2.3.0]
    • attrs [required: Any, installed: 21.2.0]
    • certifi [required: Any, installed: 2021.5.30]
    • click [required: >=4.0, installed: 7.1.2]
    • click-plugins [required: Any, installed: 1.1.1]
      • click [required: >=4.0, installed: 7.1.2]
    • cligj [required: >=0.5, installed: 0.7.2]
      • click [required: >=4.0, installed: 7.1.2]
    • numpy [required: Any, installed: 1.20.3]
    • setuptools [required: Any, installed: 56.2.0]
    • snuggs [required: >=1.4.1, installed: 1.4.7]
      • numpy [required: Any, installed: 1.20.3]
      • pyparsing [required: >=2.1.6, installed: 2.4.7]
  • requests [required: ~=2.25, installed: 2.25.1]
    • certifi [required: >=2017.4.17, installed: 2021.5.30]
    • chardet [required: >=3.0.2,<5, installed: 4.0.0]
    • idna [required: >=2.5,<3, installed: 2.10]
    • urllib3 [required: >=1.21.1,<1.27, installed: 1.26.5]
  • Shapely [required: ~=1.7, installed: 1.7.1]

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stac-utils/stactools/issues/124#issuecomment-865940749, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADDL2MAN74NDUYOZVLMFRTTUB6XJANCNFSM46K3SYPQ .

lossyrob commented 3 years ago

I was able to reproduce with the steps Chris gave above.

Come to think of it, I think this is correct behavior - changing the type of the catalog to SELF_CONTAINED shouldn't modify the asset HREFs.

This behavior changed in PySTAC as part of https://github.com/stac-utils/pystac/pull/290 (see this comment and on for discussion, here is removal). Previous to this, asset HREFs were changed as part of the catalog save according to the type.

I think the solution here is supporting modifying asset HREFs to relative during the copy. The method to do this still exists on Catalog, and we can add a flag for copy that would allow users to specify they want relative assets. Perhaps this also deserves it's own command, so that users can run a command over an existing STAC and modify all the asset hrefs to be relative.

gadomski commented 3 years ago

Thanks Rob, and sorry Chris, yeah I missed that we were talking about assets, my bad. ๐Ÿคน๐Ÿฝ

add a flag for copy that would allow users to specify they want relative assets

๐Ÿ‘๐Ÿฝ

Perhaps this also deserves it's own command, so that users can run a command over an existing STAC and modify all the asset hrefs to be relative.

I'll open a separate issue for this, since this is a different use-case.

cholmes commented 3 years ago

Cool. The change makes sense, but yeah, I still need a way to convert from absolute to relative to be able to publish. And/or need #12 Having a more specific asset conversion command makes sense, will comment more on the new issue.

gadomski commented 3 years ago

@cholmes will https://github.com/stac-utils/stactools/pull/175 (which will hopefully be merged in the next day or two) solve this issue for you?