Report invalid UTF-8 sequence in input mnemonic as `EX_DATAERR = 65`

mvondracek commented 4 years ago

'utf-8' codec can't decode byte 0xca in position 151: invalid continuation byte
'utf-8' codec can't decode byte 0xa5 in   position 85: invalid start byte
'utf-8' codec can't decode byte 0x96 in   position 89: invalid start byte
'utf-8' codec can't decode byte 0xc2 in   position 39: invalid continuation byte
'utf-8' codec can't decode byte 0xdb in   position 111: invalid continuation byte
'utf-8' codec can't decode byte 0xc6 in   position 352: invalid continuation byte
'utf-8' codec can't decode byte 0xf3 in   position 56: invalid continuation byte
'utf-8' codec can't decode byte 0xc0 in   position 58: invalid start byte
'utf-8' codec can't decode byte 0xd0 in   position 56: invalid continuation byte
'utf-8' codec can't decode byte 0x84 in   position 123: invalid start byte
'utf-8' codec can't decode byte 0xb5 in   position 140: invalid start byte
'utf-8' codec can't decode byte 0xfe in   position 138: invalid start byte
'utf-8' codec can't decode byte 0x88 in   position 237: invalid start byte
'utf-8' codec can't decode byte 0xc0 in   position 175: invalid start byte
'utf-8' codec can't decode byte 0xfb in   position 132: invalid start byte
'utf-8' codec can't decode byte 0xed in   position 71: invalid continuation byte
'utf-8' codec can't decode byte 0xf7 in   position 53: invalid start byte
'utf-8' codec can't decode byte 0xfe in   position 197: invalid start byte
'utf-8' codec can't decode bytes in   position 2-4: invalid continuation byte
'utf-8' codec can't decode byte 0xbe in   position 66: invalid start byte
'utf-8' codec can't decode byte 0xe4 in   position 7: invalid continuation byte

Is currently reported as UNKNOWN_FAILURE = 1. Would be better to report it as EX_DATAERR = 65 from mnemoniccli.ExitCode.EX_DATAERR.

Already mentioned in https://github.com/mvondracek/PA193_mnemonic_Slytherin/issues/12#issuecomment-543391802.

Errors with UTF-8 apply to dictionary, too. (https://github.com/mvondracek/PA193_mnemonic_Slytherin/pull/21#discussion_r336257567)

mvondracek commented 4 years ago

UTF-8 error could also apply to --password argument which is then used in PBKDF2.

mvondracek commented 4 years ago

Should wait until #41 gets merged.

mvondracek commented 4 years ago

@lsolodkova please write unittests & fix for invalid UTF-8 sequences. (+ def get_invalid_passwords():)

lsolodkova commented 4 years ago

Regarding non-UTF-8 dictionary: for open() in text mode, if encoding is not specified the encoding used is platform dependent. May I explicitly pass encoding='utf-8' to all open() calls and then catch UnicodeDecodeError?

mvondracek commented 4 years ago

Dictionary will not use open at all, see 4878a42442574c206dec6dfd40c107b84df27012.

As for invalid passwords, see https://github.com/mvondracek/PA193_mnemonic_Slytherin/pull/57#discussion_r339494438.

mvondracek commented 4 years ago

Concerning open for reading mnemonic phrase from input, you are right and it should specify UTF-8 so we can detect invalid sequences.

lsolodkova commented 4 years ago

For passwords we currently have valid_password function. Should I modify it or it's better to catch UnicodeError somewhere else?

mvondracek commented 4 years ago

valid_password is a good place for validation. 126d09f43cf3446ce5afbe03497f8213e8290209

mvondracek / PA193_mnemonic_Slytherin

Report invalid UTF-8 sequence in input mnemonic as `EX_DATAERR = 65` #51