schism-dev / schism

Semi-implicit Cross-scale Hydroscience Integrated System Model (SCHISM)
http://ccrm.vims.edu/schismweb/
Apache License 2.0
78 stars 84 forks source link

Python package support for SCHISM #107

Open saeed-moghimi-noaa opened 12 months ago

saeed-moghimi-noaa commented 12 months ago

@josephzhang8 @cuill @wzhengui @brey @SorooshMani-NOAA @BahramKhazaei-NOAA @gseroka

All,

Here at NOAA, we are investing a large amount of resources on using Pyschims as a part of our operational frameworks (STOFS, Psurge, UFS and ...). It seems like there are other PY packages also being developed (by SCHISM team and EU, ...) however none of them are addressing the need of the users.

I think we reach to the point that we need a concrete plan and have every body united around one accepted solution.

We need to consider:

Thanks, @saeed-moghimi-noaa

josephzhang8 commented 12 months ago

Thx Saeed. I've asked Linlin to push out and work on a branch 'clean' to start trimming down the package and also come up with tests. Given her workload, this may take a while but rest assure we are on it..

SorooshMani-NOAA commented 11 months ago

There are multiple fronts to this from my perspective.

Most importantly, we need to make sure that the package we use is maintained, and then know what the intended use of that package is. For example if we use pyschism for setting up SCHISM input files, but the package is only maintained for its download capabilities, then we're in trouble.

The second front is using one package for each task. For example let's say, pyschism, metpy, etc for downloading data and pylibs for processing all the downloaded data and generating SCHISM input, in this case model setup is one (big) task and downloading each type of data is another. I'd like to note that if a single package e.g. pyschism does both downloading and setup, it's OK, as long as it is maintained to do both (though it just makes the package complicated over time as @brey can attest to). What we'd like to avoid at all cost is a case where we need to do some types of model setup with one package and some other types of setup with another package!

The third front is release and testing. We ideally would like to have a package that is installable (from conda or pip or a combination) without needing to download the source code. And that there are some automated tests that ensure all the advertised features work (related to https://github.com/schism-dev/pyschism/issues/84).

Last but not least, it is important that we have a standard style or convention for coding such that the code is maintainable by the community rather than a single person or group. This doesn't mean that we need to rewrite all the existing code over night, but rather to move toward migration to the agreed upon style and develop new code with the convention; the convention should be enforceable by automated tools rather than "I feel like it's right".

feiye-vims commented 11 months ago

There are multiple fronts to this from my perspective.

Most importantly, we need to make sure that the package we use is maintained, and then know what the intended use of that package is. For example if we use pyschism for setting up SCHISM input files, but the package is only maintained for its download capabilities, then we're in trouble.

The second front is using one package for each task. For example let's say, pyschism, metpy, etc for downloading data and pylibs for processing all the downloaded data and generating SCHISM input, in this case model setup is one (big) task and downloading each type of data is another. I'd like to note that if a single package e.g. pyschism does both downloading and setup, it's OK, as long as it is maintained to do both (though it just makes the package complicated over time as @brey can attest to). What we'd like to avoid at all cost is a case where we need to do some types of model setup with one package and some other types of setup with another package!

The third front is release and testing. We ideally would like to have a package that is installable (from conda or pip or a combination) without needing to download the source code. And that there are some automated tests that ensure all the advertised features work (related to schism-dev/pyschism#84).

Last but not least, it is important that we have a standard style or convention for coding such that the code is maintainable by the community rather than a single person or group. This doesn't mean that we need to rewrite all the existing code over night, but rather to move toward migration to the agreed upon style and develop new code with the convention; the convention should be enforceable by automated tools rather than "I feel like it's right".

@SorooshMani-NOAA I agree with your comments.

As Joseph said, we are trying to make some changes to the pyschism package.

Concerning your Comment 1, we would like to maintain one package, and pyschism is the one. But we want to introduce some of the pylib functions as a dependent of pyschism and I'll explain more of this below. We will keep in mind the disadvantage of having a package in control of both data preparation and model setup and assess if the package is too complicated during future development.

(Per your Comment 2) we are in the process of importing some core functions of pylib (especially on hgrid manipulation) into pyschism. These functions can make the downstream script simpler (thus more readable and maintainable, which is related to your Comment 4), e.g., setting up a *.gr3 file with values based on multiple criteria (region based, bathy slope-based, nudging zone-based, etc.). In addition, the functions from pylib are generally more efficient.

However, pylib has issues with namespace contamination due to the extensive use of "import ". For example, you reported a numpy issue with pyDEM (which imports pylib) recently. It turns out it is related to some function name collision due to the numpy functions imported by "from numpy import " in pylib. As a temporary measure, we extract the hgrid functions from pylib and put it into an experimental branch (which is pip installable, and can serve as a dependent when pip installing pyschism). This is an attempt to address your Comment 3.

Let me know if you have any additional suggestions.

saeed-moghimi-noaa commented 11 months ago

Hi Fei

These are music to my ear! We share the same concerns.

Thanks a lot to all of you at VIMS, -Saeed

wzhengui commented 11 months ago

@feiye-vims

I don't have any comments on pyschism development. Glad to know pylib can help in your work.

Regarding the namespace contamination issue in pylib. Using explicitly import style (import pylib as pl), this issue can be avoided. That was the price I chose to pay when I first started designing pylib in order to maintain a simple/efficient coding style. The problem will only arise when one decides to do implicit import of pylib, but reuse pylib/numpy/pyplot function names as new variable names.

brey commented 11 months ago

Dear All,

Sorry for the late reply. As I have said in many occasions, pyposeidon is for us the do-all package that can get all necessary input and do it all (mesh generation, dem, meteo, prepare schism, post-process, visualisation). pyposeidon grew organically, I suspect the same way as pylibs & pyschism still do.

In my view, this is problematic, especially for attracting new devs. The success of searvey (a pyposeidon spinoff) suggests that smaller packages with specific task can be more modular and can serve multiple uses. I am not sure what our team will decide, but in my view we should decompose pyposeidon to smaller dedicated packages that can be used in a pipeline.

This includes the schism class that prepares a schism run based on specific input (in my view what pyschism should only do).

It could be, in the future, that any of them could become obsolete due to upstream updates. The stack can adapt easily by removing that package. In addition smaller packages can have more efficient CI/CD.

my 2 cents.

josephzhang8 commented 11 months ago

Thank you, George. I think pyschism is already modular, and consists of many reusable components. I'd imagine it's not too hard to break it down further into smaller packages.

Regards,


Joseph Zhang

(804)684 7595 (office)

SCHISM web: http://ccrm.vims.edu/schism/


From: George Breyiannis @.> Sent: Sunday, July 30, 2023 12:56 PM To: schism-dev/schism @.> Cc: Y. Joseph Zhang @.>; Mention @.> Subject: Re: [schism-dev/schism] Python package support for SCHISM (Issue #107)

[EXTERNAL to VIMS received message]

Dear All,

Sorry for the late reply. As I have said in many occasions, pyposeidon is for us the do-all package that can get all necessary input and do it all (mesh generation, dem, meteo, prepare schism, post-process, visualisation). pyposeidon grew organically, I suspect the same way as pylibs & pyschism still do.

In my view, this is problematic, especially for attracting new devs. The success of searvey (a pyposeidon spinoff) suggests that smaller packages with specific task can be more modular and can serve multiple uses. I am not sure what our team will decide, but in my view we should decompose pyposeidon to smaller dedicated packages that can be used in a pipeline.

This includes the schism class that prepares a schism run based on specific input (in my view what pyschism should only do).

It could be, in the future, that any of them could become obsolete due to upstream updates. The stack can adapt easily by removing that package. In addition smaller packages can have more efficient CI/CD.

my 2 cents.

— Reply to this email directly, view it on GitHubhttps://github.com/schism-dev/schism/issues/107#issuecomment-1657219311, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFBKNZY6AJ3KA74ZPGYYXHDXS2G2JANCNFSM6AAAAAA2A5X3KI. You are receiving this because you were mentioned.Message ID: @.***>