pyOpenSci / lessons

A repo containing lessons used in pyOpenSci training.
https://www.pyopensci.org/lessons
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

feat(act): add activity 3 + test and check lessons that support it #18

Closed lwasser closed 1 week ago

lwasser commented 1 week ago

@ucodery i knew you'd have a lot to provide here!!

I have a few questions - specifically related to raising errors.

encourage attendees to "fail fast" only when it is unrecoverable this makes the failure happen earlier, wasting less time waiting this can make the stack trace more useful, as it points the the code experiencing the error

  1. Can you help me understand what failing fast would look like? If we catch an exception where it occurs, isn't that what you'd ideally want? Or perhaps there is a deeper concept here that I just need to better understand!

clearly describe the error. rather than just printing out "exiting now", try to tell the caller why it was a mistake and what they could maybe do to fix it. maybe also explain that exceptions can (most of the time) take a message string, and it doesn't have to be printed also.

this is a really good point. i'd love some guidance. What i've done in the past in packages is done

except ValueError as ve:
    # Handle the exception and provide a custom message
    print(f"Error: {ve}. Better message here.")

but I'm still using a print statement.

the alternative would be this

if value < 0:
        raise ValueError("The value cannot be negative.")

But that is a bit more look before you leap style right?

Any input that you have is super welcome. this is totally new content AND tricky content. I think your input here is really valuable!

lwasser commented 1 week ago

Because this is a new file, it's hard to track changes in a review. I am going to merge this PR. Then, i'll integrate the changes into a new pr. That way the changes that I made are super clear. I still have the question about failing fast. but let me merge, and i'll bring your comments over into the new pr and highlight what's changed to avoid confusion!.

lwasser commented 1 week ago

Ok i am going to merge this. and pull the comments into a new pr so the changes are super clear. I have thought more about the suggestions around failing fast, and I think i understand - we probably want more language in the checks and tests lesson that supports this and then we can mention it here and link over to more detail!

ucodery commented 1 week ago
  1. Can you help me understand what failing fast would look like? If we catch an exception where it occurs, isn't that what you'd ideally want? Or perhaps there is a deeper concept here that I just need to better understand!

I think that "fail fast" is an industry term in software engineering. Rather that come up with my own explanation, there are some great ones already online. Like https://stackoverflow.com/questions/2807241/what-does-the-expression-fail-early-mean-and-when-would-you-want-to-do-so which also links to more explanations.

but here is a short contrived example of how to not fail fast

...

def do_work(num, user_choice):
    # this can fail!
    # but we've been holding on to this value for the entire execution, doing a lot of work that was meaningless because the task couldn't be completed
    # If it fails now, the stack trace doesn't point to where the value came from, or indicate how the user can correct it
    user_num = int(user_choice)

    # this can also fail! As a different, unrelated, exception
    return num / user_num

def gather_input():
    # this will fail fast, but IndexError doesn't describe the real issue to the user, that there is a required arg
    return sys.argv[1]

def main():
    user_choice = gather_input()
    data = I_load_a_large_datafile()
    number = find_magic_entry(data)
    print(do_work(number, user_choice))

...

this is a really good point. i'd love some guidance.

Building off of my earlier example, I might add this edge case logic to the program. Don't forget new excetption types are easy and cheap, and can be very descriptive to the user even without a message.

class UsageError(RuntimeError):
    pass

def gather_input():
    try:
        user_input = sys.argv[1]
    except IndexError:
        raise UsageError("You must supply a value to the program")
    try:
        user_input = int(user_input)
    except ValueError:
        raise UsageError("The value supplied to the program must be an integer")

This is still very much EAFP as the function tries grabs a good int in the happy path without conditionals. And all the messages appear only when the exception "breaks out" of the program - when it causes the program to exit, but not if any caller catches these exceptions.

Slightly off-topic, but print statements during exception handling (and most other places) should probably be replaces with logging. But that is its own potentially multi-lesson topic.

lwasser commented 1 week ago
  1. Can you help me understand what failing fast would look like? If we catch an exception where it occurs, isn't that what you'd ideally want? Or perhaps there is a deeper concept here that I just need to better understand!

I think that "fail fast" is an industry term in software engineering. Rather that come up with my own explanation, there are some great ones already online. Like https://stackoverflow.com/questions/2807241/what-does-the-expression-fail-early-mean-and-when-would-you-want-to-do-so which also links to more explanations.

but here is a short contrived example of how to not fail fast

...

def do_work(num, user_choice):
    # this can fail!
    # but we've been holding on to this value for the entire execution, doing a lot of work that was meaningless because the task couldn't be completed
    # If it fails now, the stack trace doesn't point to where the value came from, or indicate how the user can correct it
    user_num = int(user_choice)

    # this can also fail! As a different, unrelated, exception
    return num / user_num

def gather_input():
    # this will fail fast, but IndexError doesn't describe the real issue to the user, that there is a required arg
    return sys.argv[1]

def main():
    user_choice = gather_input()
    data = I_load_a_large_datafile()
    number = find_magic_entry(data)
    print(do_work(number, user_choice))

...

this is a really good point. i'd love some guidance.

Building off of my earlier example, I might add this edge case logic to the program. Don't forget new excetption types are easy and cheap, and can be very descriptive to the user even without a message.

class UsageError(RuntimeError):
    pass

def gather_input():
    try:
        user_input = sys.argv[1]
    except IndexError:
        raise UsageError("You must supply a value to the program")
    try:
        user_input = int(user_input)
    except ValueError:
        raise UsageError("The value supplied to the program must be an integer")

This is still very much EAFP as the function tries grabs a good int in the happy path without conditionals. And all the messages appear only when the exception "breaks out" of the program - when it causes the program to exit, but not if any caller catches these exceptions.

Slightly off-topic, but print statements during exception handling (and most other places) should probably be replaces with logging. But that is its own potentially multi-lesson topic.

This is great, thank you! I’ll work this into #21, which should be easier to review now that the file is merged. Having your input is helpful because many scientists, like me, don’t come from a software engineering background. I’ve learned over time through reviews (like with stravalib) and reading/learning in my free time. At CU, where I taught, it was difficult /impossible for scientists outside of computer science to take CS courses, so most learn programming as a way to process data, not the other way around.