opendatacube / odc-geo

GeoBox and geometry utilities extracted from datacube-core
https://odc-geo.readthedocs.io/en/latest/
Apache License 2.0
83 stars 15 forks source link

Delay epsg determination as much as possible #72

Closed Kirill888 closed 2 years ago

Kirill888 commented 2 years ago

Introduce special state _epsg=0, that implies "epsg code is not known yet", _epsg=None means "epsg code is known to be absent".

Force all CRS construction code paths through cache.

github-actions[bot] commented 2 years ago

🚀 Deployed on https://63567cf43b1cf12452d8e8c1--odc-geo-docs.netlify.app

Kirill888 commented 2 years ago

@snowman2 how about this change? Do you think that will fix performance issues you have observed with non-epsg SRSs?

github-actions[bot] commented 2 years ago

🚀 Deployed on https://6356801a375cb12ba573386b--odc-geo-docs.netlify.app

codecov[bot] commented 2 years ago

Codecov Report

Base: 98.19% // Head: 98.14% // Decreases project coverage by -0.05% :warning:

Coverage data is based on head (01de66e) compared to base (85c90a2). Patch coverage: 94.44% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #72 +/- ## =========================================== - Coverage 98.19% 98.14% -0.06% =========================================== Files 23 23 Lines 3772 3777 +5 =========================================== + Hits 3704 3707 +3 - Misses 68 70 +2 ``` | [Impacted Files](https://codecov.io/gh/opendatacube/odc-geo/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opendatacube) | Coverage Δ | | |---|---|---| | [odc/geo/crs.py](https://codecov.io/gh/opendatacube/odc-geo/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opendatacube#diff-b2RjL2dlby9jcnMucHk=) | `99.18% <94.28%> (-0.82%)` | :arrow_down: | | [odc/geo/\_version.py](https://codecov.io/gh/opendatacube/odc-geo/pull/72/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opendatacube#diff-b2RjL2dlby9fdmVyc2lvbi5weQ==) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opendatacube). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opendatacube)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

snowman2 commented 2 years ago

Part of the reason is False was used in #70 was because it skips the equality check as it is not necessary. Using == instead of is invokes the __eq__ method of the CRS class which attempts to create a CRS. This can cause a performance hit as PROJ has to go through a lot of logic before determining that it isn't a valid CRS.

snowman2 commented 2 years ago
In [1]: from pyproj import CRS

In [2]: cc = CRS(4326)

In [4]: %%timeit
   ...: 
   ...: cc == -1
   ...: 
   ...: 
7.54 ms ± 252 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %%timeit
   ...: 
   ...: cc is False
   ...: 
   ...: 
59.1 ns ± 5.09 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
snowman2 commented 2 years ago

Do you think that will fix performance issues you have observed with non-epsg SRSs?

Yes, I believe it will help. Thanks :+1:

Kirill888 commented 2 years ago

Part of the reason is False was used in #70 was because it skips the equality check as it is not necessary. Using == instead of is invokes the __eq__ method of the CRS class which attempts to create a CRS. This can cause a performance hit as PROJ has to go through a lot of logic before determining that it isn't a valid CRS.

My main issue with using False|None|<int> is that it complicates type a bit, and it's not really obvious which is which between None and False. The CRS.__eq__ should not trigger with the current implementation as comparison is done with None|int type

snowman2 commented 2 years ago

I gave it a try:

In [1]: from odc.geo import CRS

In [2]: cc = CRS(4326)

In [3]: %%timeit
   ...: 
   ...: cc == -1
   ...: 
   ...: 
5.07 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %%timeit
   ...: 
   ...: cc is False
   ...: 
   ...: 
31.4 ns ± 3.22 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
snowman2 commented 2 years ago

Looks like it attempts to construct a CRS if it is not a CRS each time: https://github.com/opendatacube/odc-geo/blob/85c90a232d0888408dd1e53e1a913239d6480cf9/odc/geo/crs.py#L238-L243

snowman2 commented 2 years ago

Actually, need to update to use this PR ...

snowman2 commented 2 years ago

You are correct, this seems okay:

In [5]: %%timeit
   ...: 
   ...: cc.to_epsg()
   ...: 
   ...: 
141 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
snowman2 commented 2 years ago

And this looks good too:

In [7]: %%timeit
   ...: 
   ...: CRS('PROJCS["unnamed",GEOGCS["Unknown datum based upon the custom spher
   ...: oid",DATUM["Not specified (based on custom spheroid)",SPHEROID["Custom 
   ...: spheroid",6371007.181,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532
   ...: 925199433]],PROJECTION["Sinusoidal"],PARAMETER["longitude_of_center",0]
   ...: ,PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["Meter
   ...: ",1],AXIS["Easting",EAST],AXIS["Northing",NORTH]]')
   ...: 
   ...: 
1.06 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
github-actions[bot] commented 2 years ago

🚀 Deployed on https://635709f53bbf7434d6c7ffdc--odc-geo-docs.netlify.app

Kirill888 commented 2 years ago

Looks like it attempts to construct a CRS if it is not a CRS each time:

https://github.com/opendatacube/odc-geo/blob/85c90a232d0888408dd1e53e1a913239d6480cf9/odc/geo/crs.py#L238-L243

I guess we could avoid that when rhs is int, and maybe avoid it when rhs is string.

Kirill888 commented 2 years ago

Although, with every input into CRS constructor going through the cache path, it's less of an issue.

github-actions[bot] commented 2 years ago

🚀 Deployed on https://635710c58aebe93469dc5426--odc-geo-docs.netlify.app