locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.34k stars 361 forks source link

Optimize CRS reading #2890

Closed metasim closed 5 years ago

metasim commented 5 years ago

According to a recent profile I generated, the processing of CRSs from a GeoTiff header accounts for half of the read time, including network overhead. Needs further investigation, but I thought there was code that was supposed to cache the proj4 database, and I'm wondering if something's wrong with it; org.locationtech.proj4j.io.Proj4FileReader#readFile shouldn't be called every single time a CRS decoded.

Screen Shot 2019-04-02 at 11 09 28 AM
pomadchin commented 5 years ago

Looks like not a geotrellis bug? We depend on https://github.com/locationtech/proj4j now.

metasim commented 5 years ago

@pomadchin Who did the work of extracting Proj4j into a separate project? I want to ask if any caching that was there previously has been removed due to library dependencies, etc . One of our tests went from 2 minutes to 10 minutes after upgrading GT and I'm trying to track it down.

metasim commented 5 years ago

@pomadchin Whether or not it's a GT bug is debatable over who should be responsible for caching the EPSG->Proj4j database. Probably Proj4j, but not entirely convinced.

pomadchin commented 5 years ago

I think it was @echeipesh, but it was just removing code thing. If you have a small unit test just throw it here. I think in proj4j itself there are some differences, so the unit test is required

pomadchin commented 5 years ago

@metasim so you remember what test went from 2 minutes to 10 minutes?

metasim commented 5 years ago

@pomadchin Unfortunately it was in some internal integration test that really isn't reproducible without a bunch of infrastructure. That said this benchmark reproduces the various call paths that I was having problems with. If you replace LazyCRS with just CRS you should trigger them all.