yjmantilla / sovabids

A python package for the automatic conversion of EEG datasets to the BIDS standard, with a focus on making the most out of metadata.
https://sovabids.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Implement more validation in the code and error handling #18

Open yjmantilla opened 3 years ago

yjmantilla commented 3 years ago

As of now a lot of the code is in a "hacky" state, for example Im working with a lot of dictionaries but there is a lot to validate (if keys exist, if it doesnt exist what to do , what error to raise, etc).

civier commented 3 years ago

I'm not an expert at all in error handling in python, but I highly recommend that each error message will have a unique ID. It is true that Python prints the file where error occurred and the line number, but these change between versions, and can cause confusion when users look up the error online (this is the first thing I recommend users to do). Having unique ID to each error is not so trivial when multiple people are working on the code, so maybe each developer should keep a separate list of unique IDs, and to prevent overlap, each developer will also have his/her error ID prefix? Have you ever implemented something like that?

yjmantilla commented 3 years ago

Having unique ID to each error is not so trivial when multiple people are working on the code, so maybe each developer should keep a separate list of unique IDs, and to prevent overlap, each developer will also have his/her error ID prefix? Have you ever implemented something like that?

I think this could be hard to keep track of. What about having a file that has the "error database", that is, has the unique id, what it means, and possible solutions. Maybe something like:

- errorid : 001
  name : NoSubject
  description : There wasn't a way to infer the subject from the given rules file.
  solutions : 
    - use a counter as a label 
- errorid : 002
  name : NoTask
  description : There wasn't a way to infer the task from the given rules file.
  solutions : 
    - set the entities.task field with the task

EDIT: Maybe making the name a unique string would serve as the identifier instead of the counter.

Have you ever implemented something like that?

Beyond raising exceptions with a message, not really.

Do @stebo85 , @aswinnarayanan , @DavidjWhite33 have some opinions regarding error handling?

stebo85 commented 3 years ago

I don't think it really needs a unique error id. A unique error message text should be enough to find and describe the error?

aswinnarayanan commented 3 years ago

@yjmantilla @civier I don't think error code id is a good idea for this project. The error message and description itself needs to be sufficiently explanatory for them to understand and progress without needing to check the site. So, the error code just adds an additional layer of work for developing. Also, bidscoin doesn't use error codes, so it might create some clashes in the code base down the line.

Since some users may be using CLI only, descriptive error messages with quick instructions on how to resolve would be better.

I would suggest we use the python logging module as itโ€™s robust and more flexible than print. The log handler will let you more centrally customise the log format (timestamp, message formatting etc), has multiple log levels (debug, warning, error, exception etc) and provides option to output to log file(s).

bidscoin already does exactly this ๐Ÿ˜Š so thatโ€™s probably the best place to reference. Logging handler setup. Examples of different error handling here, here, here

You could centralise the error messages into a Troubleshooting section on a page or the README. In some cases the error or fix could be more advanced (e.g. needs screenshots or webpages to describe the solution), so the user may need to be referred to this section from the error message. Using the logger module actually makes it easier to find the errors, as you can find the lines more easily without needing comments.

civier commented 3 years ago

Thanks @aswinnarayanan for the valuable input.

I agree that there is no need to complicate things with an error database, but I still thinks that error IDs are very helpful. Error messages can change with time (we might make an error message more elaborate/clear as we go along with the project), and then it's very difficult to locate past information about the particular error (in online forms and such; don't forget that often people look for help in NeuroStars, Mattermost or other forums).

My suggestion is to start each error message with a number that will stay constant (even if we change the text). It will just be part of the string, and all of us will agree not to touch it. To make things very simple, I suggest that the number will be NN-MMM with NN the developer code (Yorgiun - 01, Brayan - 02, Daniel - 03, Steffen - 04, Aswin - 05 and so on), and MMM is a number that is kept unique by each developer. It's really not a big deal, and can be very helpful down the road.

Any objections? If all are ok with that, @yjmantilla can assign the two digit code for each developer, and we'll go from there.

aswinnarayanan commented 3 years ago

Hi @civier.

If the suggestion is an error code without an error database, one could use something like a hash string (i.e. a random 6-8 digit numeric or hex) and print it next to the error message. This can be generated on the fly and would be a unique identifier.

If using sequential error numbers, there would need to be a central register of some kind, even if its just a list in the docs. Since exceptions can be located anywhere in the codebase, the dev(s) would need something to check to know when to increment.

I didn't quite understand the developer code part; why would the error be tied to the developer?

yjmantilla commented 3 years ago

@aswinnarayanan @civier

why would the error be tied to the developer?

I also don't understand this part quite yet

one could use something like a hash string (i.e. a random 6-8 digit numeric or hex) and print it next to the error message. This can be generated on the fly and would be a unique identifier.

How do we generate this hash string in a way that it stays constant? I mean, if the idea is to tie an error to an id, and such error can appear in multiple lines of code then we cannot use the code itself to generate the hash (and the code can actually change so we couldn't generate it from them either). It seems to me that we still need a mechanism to centralize. Lets say, I can generate a random identifier but for it to be kept we would need some entity that manages it.

So basically, even with random numbers, we would still need the central register to keep the random numbers , right?

aswinnarayanan commented 3 years ago

@yjmantilla Agreed, I think having error codes would necessitate a central database to track the ids.

If the suggestion is to use an id but not use a database, and the purpose is for the id to persist through changes in the error msg, perhaps a workaround could be to use a unique identifier (like a hash of the initial error msg, or a random alphanumeric). Otherwise you would need the error database.

yjmantilla commented 3 years ago

@stebo85 @civier @aswinnarayanan

An idea is to have an yaml file inside sovabids and for it to keep the initial error message so that the hashes stay the same through that initial string. A documentation page is made in sphinx by reading the file and calculating the hashes. Then we just modify the "current" string when we want to update the actual message shown to the user.

In the end, this is just having a database (๐Ÿ‘Ž) without setting the id ourselves (๐Ÿ‘):

- initialMessage : NoSubject
  currentMessage : There wasn't a way to infer the subject from the given rules file.
- initialMessage : NoTask
  currentMessage : There wasn't a way to infer the task from the given rules file.

This is just like @stebo85 said:

A unique error message text should be enough to find and describe the error?

With the addition that we keep the original unique text as an index. Allowing us to change it the current text for the future.

We could have an errors module that reads the yaml and allow us to import the errors in each file in sovabids.

yjmantilla commented 3 years ago

Well, after all, this is a database. One question I have is : "Is there a way to KEEP the initial error hash down the line without saving it in a database-like medium?"

For me, if we keep that initial hash inside the code (that is, no database) , then how do we build automatic documentation for it? We would need to hunt down in the code all the errors through some sort of source code analyzer/parser. Or worse, we would need to build it by hand/tracking them ourselves.

Lets say that ok, lets not make error documentation because making a database is not worth it. But, if we don't built documentation around it, then why is it useful making an error id? We would then just do as @aswinnarayanan said:

The error message and description itself needs to be sufficiently explanatory for them to understand and progress without needing to check the site. So, the error code just adds an additional layer of work for developing.