open-sdg / sdg-build

Python package to convert SDG-related data and metadata between formats
MIT License
5 stars 23 forks source link

Better error message when CSV whitespace issues #330

Closed brockfanning closed 1 year ago

brockfanning commented 2 years ago

Testing instructions: This PR is supposed to improve the error message in the build logs whenever there is "trailing whitespace" or "leading whitespace" in CSV data. For example, imagine the following CSV file:

Year,Units,Value
2023,Percent,100

That is valid for Open SDG. But what if it had trailing whitespace:

Year,Units,Value
2023,Percent ,100

That would cause the build checks to fail. This PR is meant to improve the failure message to actually print the Percent in the log, instead of just telling the user that there is trailing whitespace somewhere and expecting them to find it.

So for testing this, I would recommend first adding trailing whitespace (similar to above) somewhere in one CSV, confirm that the build fails, and note the error mesage.

Then point to this branch in requirements.txt and try again. Note the error message again and see if there is an improvement.

Also the same could be done with "leading whitespace" (eg, 100)

codecov[bot] commented 2 years ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.38 :tada:

Comparison is base (4a13457) 74.20% compared to head (072451e) 74.58%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## 2.2.0-dev #330 +/- ## ============================================= + Coverage 74.20% 74.58% +0.38% ============================================= Files 74 74 Lines 4590 4592 +2 ============================================= + Hits 3406 3425 +19 + Misses 1184 1167 -17 ``` | [Impacted Files](https://codecov.io/gh/open-sdg/sdg-build/pull/330?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-sdg) | Coverage Δ | | |---|---|---| | [sdg/check\_csv.py](https://codecov.io/gh/open-sdg/sdg-build/pull/330?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-sdg#diff-c2RnL2NoZWNrX2Nzdi5weQ==) | `82.50% <100.00%> (+22.24%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-sdg). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=open-sdg)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

sarah-beasley commented 1 year ago

Hi @brockfanning I've been trying to test these but have been coming across some problems. The first being that when adding a trailing white space before pointing to your feature branch, there was no error that seemed to come up. But now pointing to your feature branch in the requirements.txt file, this error comes up, and this is without there being any trailing white spaces added either.

brockfanning commented 1 year ago

@sarah-beasley This PR is different from most because I accidentally pushed it to the origin instead of my fork. So instead of the usual "brockfanng/sdg-build-1" it will actually be "open-sdg/sdg-build". Eg: git+https://github.com/open-sdg/sdg-build@brockfanning-patch-2

About the other issue though - I think the problem might be that your deploy-feature-branch workflow doesn't run python scripts/check_data.py. (It only runs python scripts/build_data.py.) There are a couple of ways to work around this issue:

  1. We could update your deploy-feature-branch workflow so that it does run python scripts/check_data.py. The only downside there is that it means that deploying feature branches will take a few minutes longer, because I believe the check_data.py script takes a few minutes to run.
  2. You could create a pull-request using your feature branch, which would trigger the test-pull-requests workflow, which does run check_data.py.
sarah-beasley commented 1 year ago

Thanks @brockfanning that was really helpful. I raised a pull request and the error message now shows as this which is great and much more informative for people to fix: image

Just one other thing, I tested the FB created before raising the pull request to see if there were any changes and where I added a space after percentage for the first two fields of 10-4-1, it has made them into another series. I guess you wouldn't be able to merge without fixing due to the tests failing but would you expect this to happen? I guess because it thinks there are two types of unit now - one with a space and one without? Note it did this regardless of which branch i was pointing at: image

brockfanning commented 1 year ago

@sarah-beasley I guess that behavior that you're seeing (the double unit) is one of the reasons that the test is important. That does seem expected though, because -- to a computer -- the added space does indeed make it a different unit. There is code we could write to automatically remove extra spaces, but I would argue that the test we already have is a better way to catch it, because it alerts the user that they did something wrong, and encourages attention to detail.