opera-adt / COMPASS

COregistered Multi-temPorAl Sar Slc
Apache License 2.0
38 stars 17 forks source link

Preparation for final delivery #205

Closed seongsujeong closed 10 months ago

seongsujeong commented 10 months ago

This PR is to prepare for the final delivery of CSLC-S1 SAS.

seongsujeong commented 10 months ago

@vbrancat Thanks for the quick turnaround. I'll merge this after making sure that it passes the CI test.

scottstanie commented 10 months ago

two things-

  1. If I make a lockfile from the environment here, and I compare it to the one by deleting the Testing and Raider sections, it goes from 261 lines to 201 lines (so 60 fewer dependencies). The biggest one is that geopandas is used in the QA script, imported at the top level, so it's unavoidable right now if you want to run s1_geocode_slc.py. it would be nice to not have to install all of geopandas (which has a huge dependency tree, e.g. scitkit-learn) just to read the shapefile. Here's a demo of using shapely/fiona instead: https://github.com/seongsujeong/COMPASS/compare/docker_final_delivery...scottstanie:COMPASS:docker_final_delivery_shapely?expand=1

  2. What if we put the raider depencies in a separate file and called --file ... twice on the other one to keep it separate? We could then also delete the testing section of the environment.yml file because we already have a separate testing requirements file so we could just reference that (though I dont think we need to install testing dependencies into the main docker anyway?)

vbrancat commented 10 months ago

2. he other one to keep i

Totally agree on the dependency of geopandas. I expressed my concerns on it in private conversations with @seongsujeong. I have provided Python code on how to open a shapefile with ogr. @seongsujeong let's work on avoiding adding this dependency.

seongsujeong commented 10 months ago

two things-

  1. If I make a lockfile from the environment here, and I compare it to the one by deleting the Testing and Raider sections, it goes from 261 lines to 201 lines (so 60 fewer dependencies).

That is expected. And I hope you have made sure that you have made sure that CI passed with the lockfile (i.e. not the YAML file) you have without Testing and RaiDER, especially without Testing. In the PR #196, you have updated environment.yml. I appreciate your effort to make COMPASS as lightweight as possible, but we need an environment that makes COMPASS works, including the unit test. You said it's fine, and I initially agreed that it should be okay, because it passed the CI, but please note that the dependency during the CI gets installed by using specfile.txt. So CI pass on that PR does not mean environments'yaml is okay, unless you update the specfile based on the new yaml file.

The biggest one is that geopandas is used in the QA script, imported at the top level, so it's unavoidable right now if you want to run s1_geocode_slc.py. it would be nice to not have to install all of geopandas (which has a huge dependency tree, e.g. scitkit-learn) just to read the shapefile. Here's a demo of using shapely/fiona instead: https://github.com/seongsujeong/COMPASS/compare/docker_final_delivery...scottstanie:COMPASS:docker_final_delivery_shapely?expand=1 After the discussion with @vbrancat , I decided to write shapely-based anyhow. Doing that will get rid of the necessity of geopandas. But please note that geopandas is a dependency of RaiDER's dependency, and we need RAiDER for the final SAS delivery. I cannot guarantee that RAiDER works without geopandas.

  1. What if we put the raider depencies in a separate file and called --file ... twice on the other one to keep it separate? We could then also delete the testing section of the environment.yml file because we already have a separate testing requirements file so we could just reference that (though I dont think we need to install testing dependencies into the main docker anyway?)

Not sure if you mean to have a separate .yml file or lockfile, but in either case, I am not sure if conda will attempt to resolve the dependency, which I don't want. The .yml for sure, The lockfile I'm not sure.

Please, note that we need to focus on two things as ADT: to deliver the SAS in the spec we promised, and to make CI run correctly to deliver SAS that is robust enough. Optimizing the Docker size is a great thing we can do before the delivery, and I again thank you for your efforts, suggestions, and tips. They are really helpful. However, I am going to focus more on the first two prongs rather than the latter one.

About the implementations I've recently put in s1_cslc_qa.py, I will spend some time to re-write it using shapely. I'll either submit another PR, or include that in this one.

seongsujeong commented 10 months ago

@LiangJYu I've changed the implementation of the land mask computation to use osgeo modules (i.e. ogr, osr, and gdal). Hope you don't mint taking a look at the new code. Than you :)