respec / HSPsquared

Hydrologic Simulation Program Python (HSPsquared)
GNU Affero General Public License v3.0
43 stars 17 forks source link

Merge changes from develop-specact into develop #155

Closed PaulDudaRESPEC closed 1 month ago

austinorr commented 2 months ago

@PaulDudaRESPEC this PR has huge diffs, and changes whitespace and line-endings in many files. Is there any way to break this up into smaller changes? I suspect that there's an errant commit in here that pulls in code that was re-saved on a machine without a correctly configured .gitconfig, which changed all the line endings in a few files. This can be tricky to fix, maybe @timcera knows some helpful git wizardry that can help isolate the intended changes in here from all the whitespace and line-ending chatter.

PaulDudaRESPEC commented 2 months ago

@austinorr , yeah, I see what you mean. These changes are almost entirely authored by @rburghol ... I wonder if we could revert to earlier versions of some of these files.

austinorr commented 2 months ago

@rburghol, there's important new features and capabilities in here, but some of them appear to be for your local testing/verification rather than belonging in the library e.g., the new contents of the /tests you've added are labeled as scripts that must be run from the home directory of the repo, and load h5 files that don't exist until after other things are run to create them.

Also, throughout this PR there are imports that use the "" like `from HSP2.SPECL import ` but we can/should be more careful with our imports so we can catch erroneous dead code easier -- like mistakenly unused imports and variables -- and not risk clobbering an important function after the import * by accidentally renaming it in the current file. Name spacing our imports is more verbose, but much safer in a complicated function-heavy repo like this.

If you'd like, I could help support you in making a few of these changes. Some of this refactoring effort seems hard, like changing file line endings but doesn't have to be with certain tools (vscode has a button along the bottom to toggle these, other editors might be similarly helpful).

To move this forward safely, we should probably rebase this on whatever the current 'develop' branch state is so we can avoid those merge conflicts that appear to come from this originally being a branch off of main. Let me know if you'd like support with that process too!

timcera commented 2 months ago

@PaulDudaRESPEC this PR has huge diffs, and changes whitespace and line-endings in many files. Is there any way to break this up into smaller changes? I suspect that there's an errant commit in here that pulls in code that was re-saved on a machine without a correctly configured .gitconfig, which changed all the line endings in a few files. This can be tricky to fix, maybe @timcera knows some helpful git wizardry that can help isolate the intended changes in here from all the whitespace and line-ending chatter.

Maybe, however does anyone remember a meme before there were memes, where someone dressed as a doctor says, "I'm not a doctor, but I play one on TV." That is me with git. "I don't know anything about git, but I pretend like I do on Github."

I'll take a look and maybe something will occur to me that could help, but I am not promising "wizardry".

rburghol commented 1 month ago

Hey @austinorr, thanks for the offers to help and yes please!

rburghol commented 1 month ago

If we have proper test coverage, we should have no fears of large changes :) -- and once whitespace is no longer showing, the changes can be seen to relatively small in terms of affecting prior function.

But if we did want to split them up, file changes can be grouped as follows:

austinorr commented 1 month ago

@rburghol whitespace fixes with vs code can be really simple, there's a little symbol at the bottom of the window that says either 'LF' or 'CRLF'. Clicking it will actually toggle it for the current file.

As for the "" imports, we really should avoid this practice. As you say, it's vague and it clobbers the namespace and risks bugs. A good linter will call this out every time as something that needs to be fixed, including flake8. From PEP8, "Wildcard imports (from import ) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API."

To move this forward, let's open a new PR with just your changes to files within the HSP2 directory with a clean diff on the current 'develop' branch of the project. The workflow could be as follows assuming you have this git repo set as your upstream remote. To get the name of your upstream remote, do 'git remote -v' :

  1. stash or commit all modified or untracked files so you have a clean working directory and all your work in progress on your current branch is saved.
  2. git checkout -b upstream-develop # make a new branch that tracks the exact current state of the projects 'develop' branch
  3. git fetch upstream # fetch current versions of all upstream branches. assumes that this remote repo is named 'upstream' on your machine. If not, use the name you configured in this command and in the later ones as needed.
  4. git reset --hard upstream/develop # hard reset this new local branch to be identical to the project's develop branch
  5. git checkout -b hsp2-specact # this will be a new branch from which you can open a new PR. since we branched from the upstream-develop one, we won't have any merge conflicts with that future PR.
  6. git checkout develop-specact -- HSP2 # this will checkout all the files from that branch that are in the HSP2 folder. This will reveal a new diff you you that you can use vscode to edit, like fixing the line endings, and removing any '*' imports.
  7. commit your changes, push your branch, and open a new PR.
  8. I'll meet you over there to help get your tests up and running with the new testing apparatus. It looks like you've got good test files already prepared, but let's work together on how to get them integrated so they automatically run.

I could do the above steps for you, but then you'd lose credit for the changes since I'd become the author. I like to avoid workflows that hide the true contributors, but it means a bit more work for you to resolve the merge conflicts.

rburghol commented 1 month ago

Awesome @austinorr thanks! I have no problems separating files out with branches, do that all the time. Quick question, since I live in linux world half the time, which should our line endings be?

rburghol commented 1 month ago

RFC @PaulDudaRESPEC @austinorr @timcera I believe that we can enforce universal translation to CRLF using a .gitattributes file in the root of the hsp2 repo, with the following line:

* text eol=crlf

Or we could be a but less ham-fisted and do:

# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto

# Declare files that will always have CRLF line endings on checkout.
*.py text eol=crlf

This would make my life easier since I am doing a bunch of work on linux here. Does anyone see a drawback to git managing our line endings in this way?

austinorr commented 1 month ago

do that all the time

I worried that the recipe might be tmi -- didn't mean to imply otherwise, just wanted to unblock the path forward.

which should our line endings be?

Which ever gives the least diff after the hard reset from the recipe above. If you have a linux-side local clone, it'll probably pull things in with the 'LF' flavor, that's standard for linux. But as you've seen, it's not possible to actually run hspf.exe on a uci file that has 'LF' line endings, so we gotta use CRLF for that to generate the .hbn etc.

I'm not actually sure what I recommend, I don't think i want my local checkout to be CRLF since that'll cause lots of problems. I think it's better to leave it as is and just be careful with checking for big unintended diffs before merging PRs.

austinorr commented 1 month ago
# Declare files that will always have CRLF line endings on checkout.
*.py text eol=crlf

^ Aah, not this!

Here's what big projects do -- they leave it up to the developer because the project doesn't know if you're contributing from linux of from windows.

e.g., here's the extent that numpy goes to enforce line endings:

# Scripts
*.sh    text eol=lf
*.sed   text
# These are explicitly windows files and should use crlf
*.bat   text eol=crlf
*.cmd   text eol=crlf
rburghol commented 1 month ago

.. the recipe might be tmi All good!

it's not possible to actually run hspf.exe on a uci file that has 'LF' line endings Ha! Makes sense, tho in all my years I have never run into this, though I have been using a Linux fortran compiled hspf, or the UCI generators that our team has used are all taking care of this behind the scenes.

*.py text eol=crlf ^ Aah, not this!

Totally cool, though it seems like for this project the .py files really are gonna cause diff problems if they are not in CRLF -- at least until we get more linux peeps in the house. Looks like I will need to add something to my workflow to make these conversions as I go.

austinorr commented 1 month ago

Let’s keep an eye on this to see if any other contributors run into this problem with their py files. I’m developing on Linux too have not yet had a problem (all my line endings are correctly set to LF for py files). I think @timcera is also on Linux

austinorr commented 1 month ago

@PaulDudaRESPEC and @rburghol we can probably close this PR and work from #161 instead.