pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
92 stars 36 forks source link

PyGMTSAR (Python GMTSAR) #60

Closed AlexeyPechnikov closed 1 year ago

AlexeyPechnikov commented 2 years ago

Submitting Author: Alexey Pechnikov (informally) or Aleksei Pechnikov (by passport) (pechnikov@mobigroup.ru) All current maintainers: (pechnikov@mobigroup.ru) Package Name: PyGMTSAR One-Line Description of Package: PyGMTSAR (Python GMTSAR) is an open source project and Python package that provides easy and fast Sentinel-1 Satellite Interferometry for everyone Repository Link: https://github.com/mobigroup/gmtsar Version submitted: 2022.9.14.3 (https://pypi.org/project/pygmtsar/) Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD Date accepted (month/day/year): TBD


Description

PyGMTSAR (Python GMTSAR) is an open source project and Python package that provides easy and fast Sentinel-1 Satellite Interferometry for everyone! While it's pure Python package under the hood it uses GMTSAR binary tools which should be installed.

Scope

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

As I know the project used in research community, high school teachers and students (see GitHub followers for more details). And it's really available for everyone in just one click by web link, try the Live Google Colab examples on the GitHub project page.

The goal of the project is easy and fast satellite interferometry (InSAR) interactive and batch processing in Python scripts and Jupyter Notebooks for Sentinel-1 radar scenes everywhere as on localhost as on cloud environments like to Google Cloud VM and AI Notebooks and Amazon EC2 and even on free of charge cloud environment Google Colab.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [x ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [x ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

lwasser commented 2 years ago

👋 hi @mobigroup thank you for this submission. I will get back to you with next steps. In the meantime can you explain what the GMTSAR binary tools are / entail? does this package wrap around them to perform tasks on sentinel data? or is it an API wrapper to download the data? Many thanks for any clarification.


EDIT: also for all package maintainers, will you please include github handles for each maintainer rather than an email? Many thanks.

AlexeyPechnikov commented 2 years ago

Hi @lwasser GMTSAR is a well-known satellite interferometry (InSAR) system based on GMT tools for geospatial data processing, see details on the project page: https://topex.ucsd.edu/gmtsar/ You’d use GMTSAR for the complete interferometry processing but it doesn’t have any user interface because it’s a set of linux console binaries and shell scripts. Every processing requires lots of manual work even when you are a linux Jedi. Also, there are some issues in the processing like to outdated interpolation methods. I use GMTSAR tools but exclude all the GMT tools for the new generation of a satellite interferometry system. I’m sorry if it sounds not so easy to understand because satellite interferometry system includes hundreds of different tools and we try to reuse some existing ones when it’s possible and add dozens of new ones (because building it from the scratch requires decades of work).

I’m a single developer of PyGMTSAR and there are no other package maintainers. PyGMTSAR GitHub repository includes some 3rd party open source software like to GMTSAR and SNAPHU (satellite interferometry phase unwrapper) and I work on GMTSAR codes too (and send the patches to it’s GitHub) to adopt the required tools to call them from PyGMTSAR. PyGMTSAR python library is located in the project subdirectory (as GMTSAR and SNAPHU codes) and it’s available as PyPi package: https://pypi.org/project/pygmtsar/ Actually, I work on the two projects at the same time: GMTSAR and PyGMTSAR where GMTSAR has lots of developers and PyGMTSAR is my own project.

lwasser commented 1 year ago

hi @mobigroup thank you for your patience with us as we stand up pyOpenSci. I think you have CI already so we can check that box. I am going to look for someone in the community to have a look at this package. We are just getting our editorial board setup and I am trying to create structure for the organization. I want to make sure that you have a good experience with us. Please know that this process may be a bit slower than usual as we get things together.

One other question - are there any other packages that perform overlapping tasks to this package?

Many thanks

AlexeyPechnikov commented 1 year ago

Hi @lwasser thanks, I understand it’s not so easy to review complex software packages! And yes, I’ve added CI tests recently.

One other question - are there any other packages that perform overlapping tasks to this package?

While there are dozens of satellite interferometry packages I don’t know any other available for everyone (you don’t need to write very complex configurations in many separate files and so on) and everywhere (as on local computer as on cloud servers and even on Google Colab) using Python scripts or Jupyter notebooks. Please follow the link to live Google Colab notebook to compute and plot the results (and edit the code and interactively view the changed results) in your browser without long hassles with software installation and other problems. At the same time, PyGMTSAR works fast using parallelization intensively and allows to process huge datasets on very common hardware like to Apple Air (using lazy computations provided by dask and xarray packages). You’d find the community on LinkedIn where I post the package updates and collect the feedback: https://www.linkedin.com/in/alexey-pechnikov/recent-activity/

lwasser commented 1 year ago

hey @mobigroup i just wanted to check in with you - we are searching for an editor so i'm hopeful that we can get things going soon. i'll stay in touch this week as I continue to search. We are just reestablishing our structures so I truly appreciate your patience here. please ping me with any questions

AlexeyPechnikov commented 1 year ago

Hi @lwasser thanks for the update. FYI, In the recent days I’ve shared PyGMTSAR Docker image which includes the ready to use environment and the live Jupyter notebooks to easy start locally without Google Colab usage. See below how to use it.

Configure your Docker runtime (Preferences -> Resources tab for Docker Desktop) to use 2 CPU cores and 8 GB RAM or 4 CPU cores and 16 GB RAM and so on. Download the Docker image (or build it yourself using the Dockerfile in the repository) and run the container forwarding port 8888 to JupyterLab using this commands inside your command line terminal window:

docker pull mobigroup/pygmtsar​
docker run -dp 8888:8888 --name pygmtsar \
    docker.io/mobigroup/pygmtsar​
docker logs pygmtsar

See the output for the JupyterLab link and copy and past it into your web browser address line. Also, the donwloaded Docker image can be started in Docker Desktop app — press “RUN” button and define the container name and the port in the opened dialog window (see “Optional settings” for the port number input field) and click on the newly created container to launch it and see the output log with the clickable link.

lwasser commented 1 year ago

oh that's cool. have you thought about making that image into a binder image? or would it not have enough resources? @mobigroup

and just an update. i think i've found an editor for this. thank you for being so patient. We are just getting the organization structure in place and it just takes time. I think this editor will be back online next monday and then we can kick things off.

In the meantime I have a few additional questions (in the future this will be in the review up front)

  1. Do you intend to continue to maintain this package or support this package for the next ~2 years? 2. Can you please fill out our survey? We will ask the reviewers for this package to also fill it out. It really helps us improve our organization.

Many thanks in advance for your time and patience here.

lwasser commented 1 year ago

One other note - if you want to work on something while we get organized, you could look into the JOSS side of things. We certaintly don't require this but

  1. you should check that you package is in scope for joss
  2. if it is, you can work on your paper.md file - instructions here

The way this works is that once our review is done, JOSS accepts our review and only looks at your paper. You are thus fast tracked through the JOSS review process via our review. More soon.

AlexeyPechnikov commented 1 year ago

Hi @lwasser thanks, I'm glad to listen about the progress!

  1. Do you intend to continue to maintain this package or support this package for the next ~2 years?

Yes, sure.

  1. Can you please fill out our survey?

Ok, I'll.

... paper.md file

Ah, got it, I completely forgot about it!

lwasser commented 1 year ago

Awesome!! Thanks @mobigroup !!! Much appreciated!!

AlexeyPechnikov commented 1 year ago

Update: NASA operator Alaska Satellite Facility has installed GMTSAR on OpenSarLab (OSL). Now you can run PyGMTSAR out of the box right after starting OSL. https://www.linkedin.com/feed/update/urn:li:activity:6987024568028708864/

NickleDave commented 1 year ago

Hi @mobigroup thank you for the update. It's clear you are doing a lot to make PyGMTSAR accessible for everyone

@lwasser asked me to check in here, in my role as editor in chief.

Just letting you know that we do have a potential editor and reviewers. We're still doing a little more homework about fit and overlap. You should hear back on this issue within a week.

In the meantime, could I just remind you to take this quick (10-15m) survey when you get a chance?

This helps us understand how well we are serving you and the rest of the Python open science community.
Thanks!

lwasser commented 1 year ago

@mobigroup so sorry i gave david the wrong survey link - this is the correct one. . ive updated David's comment above as well. this is my fault.

AlexeyPechnikov commented 1 year ago

@NickleDave @lwasser thank you for the information, I’ve filed the survey. Let me note one additional point: there is a big problem for freelancers in science community because we can’t publish a paper or obtain a grant (authors must have an affiliation to an industrial company or a research institute) even for projects completed for well-known industial companies (Google Inc, for example, see some related projects on my GitHub) or research institutes. Surprisingly, arXiv asks for registered experts review but always rejects papers from authors without affiliation too.

NickleDave commented 1 year ago

Thank you @mobigroup, we appreciate you taking the time to do that!

I hear you about how the scientific community may be pushing out freelancers (perhaps unintentionally) because of the requirement for affiliations. I guess this was prompted by a question in the survey? Should we add fields for "freelancer" or "unaffiliated" somewhere?

AlexeyPechnikov commented 1 year ago

@NickleDave It’d be valuable I think. Maybe my case is unique or maybe not: we can’t be sure without the targeted questionnaire. How about some more options for a respondent occupation? Maybe “freelancer” and “contractor” are not rare cases.

NickleDave commented 1 year ago

That makes sense, I have passed that on to @lwasser.

@mobigroup I just want to update you on where we're at with this.

We really appreciate your enthusiasm and we think there are good reasons to review the package.
But it's still not quite clear to us how pygmtsar relates to gmtsar and what specific functionality pygmtsar provides. This makes it hard for us to know which reviewers to recruit.

Before we start a review, then, we'd ask you to make changes that achieve the following:
update the readme and docs as necessary to make it clear to a totally naive reader what this package does, how it's structured, and in what way it depends on GMTSAR.

It is of course up to you to decide how best to achieve that given the goals and audience of your package.
But here is a little more detail on how I would do it if I were you, though, if it helps: Of course it's not your job to teach everyone what fast satellite interferometry is in detail. But there should be clear concrete language that, for example, defines the subject area in ~1-2 sentences, and then proceeds to explain precisely what functionality your code provides that makes it easier to work in this area, e.g. "pygmtsar provides two modules for working with the SBAS and PRM file formats" (as someone who is not a domain expert, I have no idea what these file formats are and they are not defined anywhere in the docstrings or documentation -- to me this indicates that the package can do a lot more to make the science open). Similarly, the README/docs should state up front exactly how pygmtsar and gmtsar are related in terms of code: is pygmtsar a wrapper with actual c bindings to all functions in GMTSAR? or does it only assume that GMTSAR is installed and call just one or two functions under the hood? Either would be fine, or somewhere inbetween, but how the two codebases relate needs to be very clear. I know that you have a lot of this throughout the docs in one form or another, but it should immediately hit me in the face when I'm reading, I shouldn't have to search for it and piece things together by staring at the source code.

Please let us know if that makes sense. Happy to continue the conversation here to clarify

AlexeyPechnikov commented 1 year ago

@NickleDave Ah, maybe would you explain on examples? The project documentation below cited from https://mobigroup.github.io/gmtsar/

  1. The first lines of the docs explain the relationship between PyGMTSAR and GMTSAR. Please let me know why do you think that’s not enough?

PyGMTSAR (Python GMTSAR) is an open source project and Python package that provides Sentinel-1 Satellite Interferometry for everyone! While it’s pure Python package under the hood it uses GMTSAR binary tools which should be installed.

Initially, PyGMTSAR is forked from GMTSAR GitHub repository and lot’s of changes are made to seamlessly call all the binary tools from Python API. For now, all the changes developed for PyGMTSAR project merged into GMTSAR and the both projects use the same binary tools although PyGMTSAR maintains rich Python API for interactive and batch computations and GMTSAR provides a set of shell scripts for the batch processing only. To prevent confusing, below PyGMTSAR means the Python package only and GMTSAR means the binary core tools (while these are available in PyGMTSAR GitHub repository too).

  1. The subject area is explained and I actually confused why you’ve found it not informative.

Why PyGMTSAR?

PyGMTSAR itself combines powerful Python instrumentary for sophisticated multidementional processing (xarray library) and lazy calculations (dask library) plus parallel computing (dask and joblib libraries) to perform fast and interactive processing on huge datasets. And the best algorithms and numerical computation approaches applied for all the processing steps. There are progressbars and preview plots for the every step and that’s easy to save intermediate results and continue work later on the same or other host. And (thanks to joblib library) that’s safe to interrupt the execution at any time without memory leaks (common for dask-based solutions).

Thanks to all the powerful Python libraries and the best used algorithms PyGMTSAR is fast and its possible to complete SBAS analysis for 5 years on 800 interferograms in just one day on Apple Air or Apple iMac (8 cores and 16 GB RAM) using 2 TB Sentinel-1 SLC scenes. And PyGMTSAR is user-friendly providing functions to download the required satellite orbit files and DEM and so on. This combination of the human-readable and short code and powerful computing is the key to use PyGMTSAR everywhere from education and to research and more.

  1. The functions reference lists the top-level functions with notes is that PyGMTSAR pure Python or GMTSAR binary wrapper. To be honest I don’t know to make it more clear.

List scenes and orbits (PyGMTSAR original) … Framing (combined GMTSAR wrapper and PyGMTSAR original) … Build Interferogram (GMTSAR wrapper) …

Please let me know some exact recommendations how to make the documentation more clear!

NickleDave commented 1 year ago

Hi again @mobigroup, thank you for your detailed reply.
You are correct that you have written something about all of the things I listed, as the text you quoted shows. As I said above

I know that you have a lot of this throughout the docs in one form or another

Let me explain what I'm looking for in terms of our process and our criteria.
Note that the criteria are here: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/intro.html# And the process is here: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/aims-and-scope.html

These are the relevant criteria/goals:

  1. Limit overlapping functionality in packages
  2. Improve usability through documentation
  3. Design and plan for long term maintenance

limit overlapping functionality

This is where we have a strong case to review the package. It's obviously in scope for pyOpenSci. Since you are providing to some extent a wrapper around GMTSAR, and there's no other wrappers, it's likely that there's little overlap in functionality with other packages.

But we still need to evaluate this. I can't currently do that because I am not able to find a list of features, unless I take the time to read the GMTSAR docs and learn it myself, then take the time to work with pyGMTSAR to see how it provides that functionality.

There's a couple of ways to address this:

have a clear list of features in the README.

Notice that I said a list, not paragraphs of text. These features should be bullet points with succinct, concrete verbs and nouns so that it's easy for a user to find and understand them.
As shown here, for example: https://www.writethedocs.org/guide/writing/beginners-guide-to-docs/#id1 (I know you are by no means a beginner, even though that link is to a beginners guide. I am trying to provide you with examples from others so you do not feel like I am just arbitrarily making up criteria to be a jerk) Here's an example of an actual README that does this: https://github.com/amitmerchant1990/electron-markdownify#key-features from https://github.com/matiassingers/awesome-readme

Just adding this list alone would go a long way towards helping me and potential reviewers understand the specific functionality provided by pyGMTSAR

have API documentation

The second way to make the functionality of pyGMTSAR plain as day would be to add API docs, for example like pyGMT provides: https://www.pygmt.org/dev/api/index.html

Design and plan for long term maintenance

Let me explain this before I say anything about the criterion of documentation.
We consider reviewed packages part of our ecosystem.
https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/intro.html#how-is-pyopensci-different-from-joss-and-other-review-processes This means that in some sense I am committing that I would be a maintainer of the library if needed. Of course we want you to continue develop the package on your own, but we need to review with the idea in mind that we might be maintaining it or seeking new maintainers.

Currently the way the package is presented gives me the impression that it would require a very significant investment of time for anyone else to even begin to understand how to take over development and maintenance. This makes me very hesitant to review.

There's a couple of things that concern me in this respect:

See also this section in our guidebook about maintenance:
https://www.pyopensci.org/contributing-guide/open-source-software-submissions/author-guide.html#does-your-package-have-a-2-year-maintenance-plan

I am not asking you to commit to two years of maintenance now, but I am saying that I am having a hard time imagining who we could even ask to take over maintenance should you need to step away in the future.

improve package usability through documentation

Lastly let me talk about the documentation. I hope it will make more sense here in light of the review criteria I have just listed and discussed. Here I am replying directly to the examples you provided.

Please know as I said before that your enthusiasm for the project is quite evident as well as the significant amount of work you have already put in. And there are a lot of great things about the documentation! For example the Google CoLabs notebooks are very close to the kind of vignettes we would like projects to have.

With what I say, I am trying to help you. The main thing I would want to see before we begin a review is a revision of the documentation with an emphasis on concise, clear writing, without run-on sentences and without paragraphs of text except where absolutely needed, e.g. a tutorial. Again, I am not saying this because I enjoy critiquing people's writing; I'm asking for this because it would help me know that you are focused on making the project approachable by as wide an audience as possible, as fits with our criteria, and that we would be able to find someone who could if needed take over maintenance.

The first lines of the docs explain the relationship between PyGMTSAR and GMTSAR. Please let me know why do you think that’s not enough?

Now that you point me at it again, I do see you state a description of the project first thing in the README. But it's only after I've already read through half a paragraph of text, and it's a very long sentence.

A relevant example of a project description from the pyGMT docs: https://www.pygmt.org/dev/index.html#about

PyGMT is a library for processing geospatial and geophysical data and making publication quality maps and figures. It provides a Pythonic interface for the Generic Mapping Tools (GMT), a command-line program widely used in the Earth Sciences.

Notice that this is only two sentences and it tells me exactly what PyGMT does at the highest level. I do not need any knowledge of GMT to understand.

If I were to rewrite your description of pyGMTSAR, I would boil the second paragraph of your README down to something like the following: "pyGMTSAR provides easy and fast satellite interferometry (InSAR) processing for Sentinel-1 radar scenes everywhere. Under the hood, pyGMTSAR uses GMTSAR binary command line tools, but all GMTSAR scripts and GMT commands are replaced by Python code using modern and robust algorithms."

The subject area is explained and I actually confused why you’ve found it not informative.

I am not a domain expert, so what you have written does not tell me a lot. If you really want to reach a wider audience of potential users, contributors, and future maintainers, that do not already have deep knowledge of GMT and GMTSAR, you could additionally provide one more paragraph of just one or two sentences defining what satellite interferometry is, e.g. with a wikipedia link. Here the GMTSAR docs do a good job of this, which of course you are familiar with but I'm just trying to illustrate my point: https://topex.ucsd.edu/gmtsar/ For me the description there is much clearer. It uses the most general terms possible ("InSAR processing system designed for users familiar with Generic Mapping Tools (GMT)") -- this gives me a general feel for what GMTSAR does -- instead of specific lingo that only experts will be familiar with ("SBAS analysis for 5 years on 800 interferograms ... using 2 TB Sentinel-1 SLC scenes", I have no idea what that means). In addition, the GMTSAR landing page provides multiple citations that tell someone interested where they could go to read more for background information.

The functions reference lists the top-level functions with notes is that PyGMTSAR pure Python or GMTSAR binary wrapper.

I am not able to find the text you quoted in either the README or the docs -- can you link me to them?

Are these in the modules themselves? I did see in one or two places you wrote something like "this wraps function gmtscript.sh".

What I want to know is: is there just two classes in pyGMTSAR? I'm not saying it's bad if there are, but I'm saying I can't tell from the docs or code, I would have to install it myself and literally dir the imported package. If there's just two classes, how many of the methods directly wrap GMTSAR functions? All of them? Just one or two? If it's just one or two, what do the other methods do, which would be considered extra functionality provided by pyGMTSAR? If pyGMTSAR does provide this extra functionality, then it should be made as explicit as possible: e.g., your list of features and your example code snippets. A made-up example to illustrate my point: "pyGMTSAR offers the ability to serialize SBAS files to .json with the SBAR.jsonify method". This is exactly where we would want to have clear information in the docs, like a list of features and API documentation, so we can compare with any other SAR packages that might provide similar functionality.

Other things that would help make the package usage and goals obvious in a clear concise way include:

Again, I can see that you have put a ton of work into pyGMTSAR. I know I wrote a lot here. All of this feedback is meant to help you reach a wider audience, and to make sure we are set up for a good review experience. Please feel free to follow up with questions and more discussion.

AlexeyPechnikov commented 1 year ago

Hi @NickleDave your explanation is detailed, thanks! I see I need to reread it some times point-by-point. But I'm afraid you still didn't check the project documentation, right? Please open the documentation to find the answers for some of the questions.

PyGMTSAR Documentation and the Functions Reference

  1. PyGMTSAR documentation is linked on the GitHub https://github.com/mobigroup/gmtsar with GitHub Pages icon:

Screenshot_2022-10-23_at_11_20_36

  1. PyGMTSAR documentation is on the PyPi package page https://pypi.org/project/pygmtsar/

Screenshot_2022-10-23_at_11_24_44

  1. PyGMTSAR documentation is linked on the DockerHub page https://hub.docker.com/r/mobigroup/pygmtsar

Screenshot_2022-10-23_at_11_26_50

And the both PyPI and DockerHub pages available directly from the top of README GitHub page:

Screenshot_2022-10-23_at_11_28_01

PyGMTSAR original and wrapped GMTSAR functions

The functions reference includes the exact note for the original and wrapped PyGMTSR functionality:

Screenshot_2022-10-23_at_11_32_21

Screenshot_2022-10-23_at_11_36_19

Package Installation

There are so many ways to install the package:

  1. Google Colab Notebooks install the package and all the dependencies automatically and all the installation instructions available in the notebooks:
Screenshot 2022-10-23 at 11 39 49
  1. The package is pre-installed in the Docker image on DockerHub https://hub.docker.com/r/mobigroup/pygmtsar and all the installation instructions available in the Docker file https://github.com/mobigroup/gmtsar/tree/master/docker
Screenshot 2022-10-23 at 11 43 04

The Docker image usage instructions listed on the DockerHub page:

Screenshot 2022-10-23 at 11 47 13

Also, the Docker image build instructions included too.

  1. The Python PyGMTSAR library installation for the case when you already have GMTSAR binaries installed is described on the PyPI page https://pypi.org/project/pygmtsar/

Screenshot_2022-10-23_at_11_44_31

  1. Continuous integration scripts (CI on GitHub Actions) provide the installation instructions too, see https://github.com/mobigroup/gmtsar/tree/master/.github/workflows
Screenshot 2022-10-23 at 11 53 07
  1. At the end of the day, read the installation instructions in the documentation:
Screenshot 2022-10-23 at 11 53 58

Brief demonstration usage

The Google Colab Live examples provides the use cases and compare the results to other InSAR systems (that's important for science research purposes) and list the keywords to find the notebook resolving the exact tasks:

Screenshot 2022-10-23 at 12 20 50

Do you need to use landmark and compute a vertical displacement? See the notebook "with landmask applied to interferogram, unwapped phase, and LOS, east-west, vertical displacement results." Do you need to crop your scenes? Look for "cropped subswath processing" notebook. How about atmospheric noise correction? It's easy to find too: "detrending approach to remove atmospheric noise". These are not just code snippets but the ready to use interactive processing pipelines.

The project goal

That's easy and explained in the project title:

PyGMTSAR (Python GMTSAR) - Sentinel-1 Satellite Interferometry For Everyone

Screenshot 2022-10-23 at 12 00 17

Even GMTSAR developers were sure that's impossible! You'd find our discussions in GMTSAR GitHub project tracker. InSAR users have 256 - 512 GB RAM and make very complex run scripts for the processing. PyGMTSAR performs the processing even on 16 GB RAM Apple Air. PyGMTSAR provides more functionality and faster processing on Apple Air using interactive Jupyter Notebook than you'd obtain on 512 GB RAM host using custom and complex GMTSAR console scripts. As the result, PyGMTSAR is available even in Google Colab Notebooks but we can't do the same for any other satellite interferometry system as I know. This is the huge difference between any existing InSAR system and widely available InSAR system PyGMTSAR. And all the technical solutions are targeted to achieve the goal!

NickleDave commented 1 year ago

Hi @mobigroup, yes let's go point-by-point. I'm sorry for the lengthy response that was probably overwhelming.

Thanks for the screenshots that show the API documentation.

Can you please share a link to that? I did look at your docs, many times, but obviously I was not able to find it.

AlexeyPechnikov commented 1 year ago

@NickleDave The documentation https://mobigroup.github.io/gmtsar/ includes "Reference" chapter right after "Tutorials: Live Examples on Google Colab" chapter:

image image
NickleDave commented 1 year ago

Thank you @mobigroup -- I do see the reference documentation now if I scroll past the tutorials.

This helps me see that many (most?) of the methods on the SBAS class are denoted as unique to pygmtsar.

NickleDave commented 1 year ago

I'm sorry for my slow replies @mobigroup.

We've run into a couple things here that we haven't handled before so we wanted to get feedback. I will update you in the next couple of days -- didn't want you to think we had forgotten about you.

NickleDave commented 1 year ago

Hi again @mobigroup. Thank you for your continued patience.

As I said above, it seems pretty clear that the functionality of pygmtsar is within scope for pyOpenSci. And with your help, I was able to understand that pygmtsar provides many features in addition the functionality of GMTSAR that it wraps. It's also clear you are putting a lot of work into developing and promoting the package, from the work you are putting in to the README now, and the questions you are handling from users in the issues.

But I don't think we can review at this time, unless you would be able to:

As I stated in the comment above, this is because by reviewing and accepting a package into the pyOpenSci ecosystem, we are committing to support maintenance of the package. That includes finding another maintainer should you need to step away. It is unclear to us whether we would be able to guarantee that for pygmtsar if we went ahead with the review.

I am sorry these criteria were not more clearly documented in our guidebook. We have not run into this exact situation before so we are figuring out how to handle it on the fly. We will learn from this review process and provide clearer standards, in particular for packages that are wrappers around lower-level languages.

If you feel like you cannot make these changes to the development workflow right now, please know I very much understand as a developer myself. We would still be open to reviewing in the future.

You might consider submitting for review with JOSS in the meantime, which provides a similar review process. That way you could still get credit and get feedback from reviewers without needing to adapt to our specific standards.

Please let me know if the reasoning I am giving for not going forward with the review is clear.

AlexeyPechnikov commented 1 year ago

@NickleDave I'm sorry but it looks inappropriate. I spent a lot of time answering on your questions (and attaching screenshots because you didn't read the project documentation and wrote again and again about missed documentation) and today you rejecting all the work referencing to the 3 dummy points:

  1. "so that pygmtsar is clearly its own package, not a fork of GMTSAR" - Anyone can check the GitHub repository for all the changes in PyGMTSAR files to be sure in the authorship.

  2. "and so that you are not relying on a branch that is simultaneously behind and ahead the main GMTSAR branch". - why do you ask me about the different project development? While I work on GMTSAR software too but that's obviously is not related to your review. Even more, the repository is synced to upstream GMTSAR because I validate and add the upstream changes: https://github.com/mobigroup/gmtsar/commit/09f3723e6f9f58d06e8cc3c913a4d2a647dd0939

  3. "For example like the dev docs provided by [numpy], [pandas], [scikit-learn], and [pygmt]."

You are referencing to adopted GitHub documentation for developers available on https://docs.github.com/en/get-started/quickstart/contributing-to-projects How does it related to your review?

@lwasser Please let me know is it your official position to insult authors ("unless you would be able to: so that pygmtsar is clearly its own package, not a fork of GMTSAR") to reject an application? Obviously, I did'n stole the ownership and it's very easy to prove using GitHub commits record and @NickleDave have no any right to blame me or anyone else!

NickleDave commented 1 year ago

Hi @mobigroup

I spent a lot of time answering on your questions

I know this and I completely understand your frustration.
edit: to give you context, we also spent a lot of time discussing how to handle this review internally because we have not dealt with this kind of package before. I know that it is not great for you that we are figuring this out now, and I appreciate your patience.

attaching screenshots

You are right that I could have scrolled through the entire single page of documentation the first few times I read it, so that I could find the API reference at the bottom.
I am happy to provide more feedback about ways you could structure your documentation so that it is immediately obvious to a new user how to find what they are looking for. I have omitted that feedback from my last comment, because I want to focus on the main reason we have chosen not to proceed with a review at this time.

"unless you would be able to: so that pygmtsar is clearly its own package, not a fork of GMTSAR"

I think my meaning was not clear here; by no means do I mean to say that you are stealing ownership.
It's just not obvious to a new user who visits the repo that pygmtsar is a separate package, because it's a fork, and still has the name GMTSAR. My goal is to help you make it obvious to a new user that they are looking at a library called pygmtsar, not a fork of GMTSAR. That is a minor point that is also not related to the main reason we decided not to review. Again I apologize if my language sounded accusatory, that was not my goal.

edit: I know you have very good reasons for adopting your development workflow and that you have put in a ton of effort to provide easy access to GMTSAR. It's great that you found something that works for you and it's clearly valuable to your users. But for a package to be part of pyOpenSci we need to know that there's more than one person that could, if necessary, contribute to and maintain the library, without needing to reverse-engineer everything you have done.

You are referencing to adopted GitHub documentation

I am sorry if my reasons for sharing these links was not clear. I am not asking you to provide an entire section on how to use GitHub.

I am asking you to provide a single page with a verbal description of how you develop and build pygmtsar. This would explain the logic of the build scripts, and any other information unique to your development workflow, such as how you validate and add upstream changes.

This would show how another person could contribute and if necessary maintain the package, should you need to step away.

If you can provide that single page documenting how you develop, then it will help us understand what would be required to maintain the project.

Does that make sense?

AlexeyPechnikov commented 1 year ago

@NickleDave Thanks, that sounds better. At the same time:

  1. The Python package for review is https://pypi.org/project/pygmtsar/ and it includes PyGMTSAR Python codes only. The Python sources placed in the sonamed GitHub project directory of my fork of GMTSAR repository because these sources are required for the development to provide the projects compatibility (PyGMTSAR can reproduce GMTSAR results for research purposes). GMTSAR package itself includes 3rd party SNAPHU software and it doesn’t prevent GMTSAR to be a well known science package. And only you see some issue when 3rd party open source software with the license and copyright included into GitHub repository.

  2. The Python package includes common setup.py which can be used by any Python package manager to install it. That’s unified and not unique. The Python package has common GitHub continuous integration (CI) scripts named GitHub Actions which running to test every commit or pull request. And it’s unified and not unique too. The package provides Dockerfile to build the Docker images by common way. And again that’s common not unique tool. At the end of the day PyGMTSAR has the unique functions in standard Python package in standard GitHub repository with standard continuous integration for standard git commits and pull requests for all developers. Actually, PyGMTSAR follows the best practices for GitHub repositories.

  3. What is the real reason to reject the review process? When you suddenly send weird new requirement to have a unique development process instead of unique functions I suppose the reason is not technical and not related to the project. Please explain why PyGMTSAR can’t be reviewed when it’s satisfying for all the public requirements? Why only for me you invented the new requirement to have a unique development process (you didn’t ask me if the development process has any unique points which need to be documented but you strongly require to have the unique development process)? Maybe just to find any reason for the rejection?

NickleDave commented 1 year ago

@mobigroup

We are not rejecting a review. I should have phrased that differently in my last reply.

As I said in the reply before that, if you can provide documentation of the development process, that will help us understand what is required. If we feel that we would be able to find new maintainers given the requirement, then we could proceed with a review.

You have just helped me better understand development. Thank you.

Why only for me you invented the new requirement

Like I said, we have not dealt with this situation before and I apologize that we are working out how to handle it now. I get why you feel like I am making up some new criterion. I acknowledge that we do not currently have very specific language in our guidebook about what kind of documentation a project needs to have about its development. I have raised this issue to add it to avoid this problem in the future: https://github.com/pyOpenSci/python-package-guide/issues/8

Please give me and the rest of the editors until Monday at the latest to carefully read your reply and understand what you have explained. I want to make sure I do not make this process any more frustrating for you because I have not thoroughly understood what you are telling me here. I will reply in more detail as soon as I have a chance.

NickleDave commented 1 year ago

Hi @mobigroup thank you for your reply, and for your patience with this review. I think I see now your development workflow.

To update you: we are continuing to talk with maintainers that develop similar packages so we can better understand what would be required for maintenance.

We should have concrete feedback for you by the end of the week. Again I am sorry it is not sooner -- @lwasser is dealing with a family situation right now. We appreciate your continued patience.

AlexeyPechnikov commented 1 year ago

Hi @NickleDave thank you for the update!

lwasser commented 1 year ago

Hi, Alexey.

First and foremost I wanted to say thank you for all of the energy that you’ve put into this review and as a maintainer for pyGMTstar. We know that working with the GMT tools is not easy. And we appreciate your expertise as a maintainer and your efforts to make accessing sentinel data easier for scientists.

I want to start by saying we are not rejecting your package. However, in order to review your package, we require some significant changes to its current structure.

A foundational goal for pyOpenSci is continued maintenance of our packages. We are not a journal per se, rather we are an ecosystem that curates lists of vetted, peer reviewed tools. Our partnership with JOSS supports the Journal / publication component if publishing is all that you desire. As such, for every package that comes to pyOpenSci, we need to carefully consider whether we can support future maintenance of the tool in case a maintainer steps down. Mainainters with the best intentions burn out, get new jobs, etc. The issue of long-term maintenance and dependability of packages is a problem that pyOpenSci seeks to address.

Your package, as designed, would be challenging for us to support. Its core structure depends upon two repositories being updated and staying in sync. Syncing files across two repos would be extremely difficult for a new maintainer to take on.

Another foundational goal for pyOpenSci is to facilitate contributions from the community to our packages. Contributions from others who are not working on pyGMTstar would also be incredibly challenging given the current structure of pyGMTStar package. Here is what we’d need to review pyGMTStar While we don’t have this specifically documented yet, we’d prefer that the code base from gmtstar be in some way wrapped as a dependency or vendored in some way. Some have suggested a git submodule might work. In this scenario you would push all updates to the gmtstar repo directly that related to the C codebase. pyGMTstar would NOT be a fork. Related to the above, the README file in the case of pyGMTstar should contain a few sentences at the top information about how your package relates to GMTstar (gmtstar is effectively vendored via your fork). And how the code base between the two repos is synced (or not synced). In the case of this review, had we understood how gmtstar was vendored and synced earlier on, we would have been able to have this discussion with you sooner. Development instructions: We require that every package contains instructions for contributing. A CONTRIBUTING.md file that contains this information about how you expect contributions to work, and in this case with some indication of dependencies that might need to be installed, should be provided in the README. This link onGitHub is one we plan to use in future issue templates as an initial check that you have some of the basic files in your repo that helps a user navigate. Some packages have more robust contributing documentation with issue templates and broader guides on how the issue/pr/ workflow operates in the project; this level of documentation is not expected by us but we see it and appreciate it when it’s there. These are not new standards. These standards are ones that even GitHub embraces. I am working hard to better document things on our end for future submissions. Repository name: For user clarity we prefer that the repository name is the name of the package. If that is not possible, at the minimum it’s confusing for users to have two repositories named gmtstar - one of which is actually pygmtstar. Please know that this is NOT A NEW rule. However it’s taking me some time to edit our packaging guide. README: that same PR that I linked to above has more specifics about what we want to see in your readme. I do understand that David missed some elements that existed in your documentation / repository. However, one core element of a good README file is that it is organized in a way that it directs users to all of those elements. This too has always existed in our docs. I am working on a rewrite now to make that more clear.

Please know that right now I am the sole full time employee of pyOpenSci. And I am lucky to be working with volunteers like David and others every day to build this project. This is only my second month here full time. Unfortunately I can only do so much each day which is why some of the things I wrote above aren’t fully in writing - yet. This is the first package we have considered reviewing that has a C library as its base and that is also vendored via a fork.

Again, we think you’ve done incredible work here. We are not rejecting you. Rather we are saying that right now, the way the package is structured conflicts with our goals as an organization. And thus we are unable to review it. However, in the future, if you are able to work on the items above, we could consider it for review.

I understand and respect if you disagree with anything that i’ve written above. But I need to adhere to the goals and mission of the organization as I make decisions. If you have any followup questions, I'm happy to answer them here. If there are no questions, we will close this issue (for now). You are welcome to re-open it in the future or submit a new issue.

You might also consider looking at JOSS in case they do have the technical expertise to review your package as it is now. For now i've marked the issue out-of-scope for us as it's out of our technical scope to review.

Thank you for understanding what we are trying to achieve at pyOpenSci, Alexey. We appreciate your hard work on this package. Respectfully, Leah

lwasser commented 1 year ago

closing this given there has been no engagement in a few weeks. thank you everyone for your patience.