Closed bgillesp closed 3 years ago
I removed the BOM from the French wordlist in ae726b39ad74323d7128f8991feb7e36f5b8a16c
I think that's much better fix, because it also prevents issues with alternative implementations using the same wordlist file.
Thanks for the report!
Great, that will do it -- I was hesitant to mess around with the wordlist file itself since it's so standardized, but your fix actually brings the file in line with the standard wordlist at the BIP-0039 repository. Thanks for the quick turnaround!
The Unicode encoding UTF-8 allows an optional character at the beginning of the file called the "byte order mark", or BOM. In UTF-16 or UTF-32, this character represents whether the byte order of characters is big- endian or little-endian. The UTF-8 standard does not require or recommend the use of a BOM; however, a BOM may still be included in UTF-8 files for a number of reasons.
The BIP-39 wordlist file
french.txt
currently includes the byte order markU+FEFF
at the beginning of the file. The encoding method used inMnemonic.__init__
to read this file is 'utf-8', which does not parse any BOM at the beginning of a file, and thus produces a Python list with first entry '\ufeffabaisser' instead of the correct string 'abaisser'. This in particular results in valid French mnemonic seed phrases starting with 'abaisser' to be incorrectly rejected by theMnemonic.check
validation function.The commit in this pull request changes the encoding method used to read the wordlist from 'utf-8' to 'utf-8-sig', which causes Python to properly interpret BOM characters in UTF-8 files, and fixes the incorrect first entry in the French language
Mnemonic
object's wordlist.