swcarpentry / python-novice-inflammation

Programming with Python
http://swcarpentry.github.io/python-novice-inflammation/
Other
303 stars 781 forks source link

Using Assert as input validation is wrong and needs to be changed #1084

Open dhvalden opened 4 months ago

dhvalden commented 4 months ago

What is the problem?

The problem has been reported several times already, so I'm late to the party (here fore example https://github.com/swcarpentry/python-novice-inflammation/issues/979).

To recap, assert keyword is not to be used as proposed in the example of the episode of Defensive Programming. The reason has been pointed out several times with various degrees of detail, but it boils down to that is basically a debug and testing phase tool, that can be bypassed at runtime by the interpreter. Python documentation states that basically evaluates to:

if __debug__:
    if not expression: raise AssertionError

The episode of Defensive Programming focuses mainly on input validation. Input validation needs to stop the program every time an incorrect input is passed. if raise blocks are the standard and canonical way of doing this. For an example see this function in the scipy repository:

https://github.com/scipy/scipy/blob/v1.14.0/scipy/optimize/_minimize.py#L53-L775

I do not propose to also add the idea of exceptions neither try except blocks, since a novice user will not need this, I simple propose to replace assert blocks with if raise blocks. I will provide a pull request of this in the coming days.

Location of problem (optional)

https://github.com/swcarpentry/python-novice-inflammation/blob/main/episodes/10-defensive.md

AlbertoImg commented 4 months ago

Hi @dhvalden, Thanks for your suggestions. It is a valid point, but since it is involving a big part of the episode I will call the curriculum-advisors for reviewing the suggestions as well. @swcarpentry/curriculum-advisors, @swcarpentry/governance-committee do you have an option on that?

dhvalden commented 4 months ago

Dear @AlbertoImg,

Of course! No worries. But just to reiterate. The use of assert to sanitise inputs is a bad practice. That is the job of if... raise Error blocks, and we should not be promoting bad practices in the courses.

dhvalden commented 3 months ago

Dear @AlbertoImg, I just created the promised pull request.

AlbertoImg commented 3 months ago

Hi @dhvalden, thanks for the pull request! we will look into it with the curriculum advisors.

martinosorb commented 3 months ago

Answering as a member of the CAC, but not representing the rest of the CAC.

I can agree that one should not use assert as input validation (although not all examples of assertions in the code of this episode are input validation). However these changes are much, much wider than that, and I think they would all need to be discussed. Just a few examples:

In summary: one thing is removing the use of assert as input validation, one thing is completely rewriting the episode.

dhvalden commented 3 months ago

Hey! @martinosorb

To comment on your points:

  1. Why would we want to teach tests if we haven't learned how to raise errors correctly first? It is a nice concept to mention, yes, after people know how to raise errors and learn about exceptions. AssertionError is a special class of exception. It follows naturally after general errors.

  2. I don't know how I can address this comment if you have a personal beef with try but this is the pythonic way of handling errors. It is particularly common and useful in API calls, where the payload can change unexpectedly, APIs are from where many researchers get their data. It's also common when handling user inputs, if for example you want to try to transform an invalid user inputs (e.g. coerce a string to numbers, a list to array, etc). Finally, if beginners will abuse something, better abuse the pythonic way of handling errors, than the incorrect way of input validation, right?

  3. It's alright, we/you can teach assertions later and change the episode name. But how would you teach assertions, that use heavily the exception AssertionError, without teaching what an exception is and how to raise errors first?

Finally, correct input validation and correct error handling is a much more "beginner need" than test driven development. The former will be useful all the time in simple scripts, whereas the latter becomes important in a larger interconnected project, like a library or app. Even the Official Python Tutorial thinks this way, with Errors and Exceptions handling being the 8th lesson https://docs.python.org/3/tutorial/errors.html whereas assert is not mentioned and unit test is only briefly mentioned as part of the standard library in chapter 10.

chillenzer commented 4 weeks ago

I can see both side's points here but I'm also leaning more towards @dhvalden and want to add a few aspects to the discussion.

  1. I personally like test-driven development but I think that concept is much richer than "test first". When teaching that section for the first time, I was really disappointed that most of the things that are important to me in TDD are not mentioned at all. So, while I prefer it to get mentioned at all, I would prefer the current version of the lecture to steer away from the term "TDD" and use something more humble like "test first". I think it might even be dangerous because if new learners afterwards attempt to use TDD, their knowledge about it might make them ineffective and lead to them abandoning that term without ever fully understanding (let alone using) it. On the other hand, if input validation should drop out due to this discussion, it would be a good opportunity to actually elaborate on TDD and provide some actual guidance how to effectively use it.

  2. I strongly agree that try... except... is a very pythonic construct. As opposed to other languages, it is not associated with a design flaw or a performance penalty but simply a "Ask for forgiveness not permission" attitude. There are a few best practices around that but it is a much shallower concept than good testing, let alone TDD.

  3. Personally, I never use assert outside of tests, so I was quite surprised to see it in the lectures at all. For any form of errors, including input validation, there exist more tailored exceptions and it is much more expressive if you simply derive your own exceptions for more specialised cases. Also, while try... except... in general is very pythonic, I'd personally frown if I'd see a except AssertionError for precisely the reason that it might not get thrown at all.

To summarise, that section always felt a bit off in my teaching, too. Both concepts are important but the current lecture doesn't do either of them a favour with using bad practice for input validation and leaving a lot of room for misinterpretation concerning TDD. I think focusing on one would benefit the learner. Which one, is for a bigger audience to decide.