oncokb / oncokb-annotator

Annotates variants in MAF with OncoKB annotation.
GNU Affero General Public License v3.0
122 stars 61 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2270: invalid continuation byte #219

Open valleyUp opened 3 months ago

valleyUp commented 3 months ago

Hi, During the process of reading MAF files, it seems that the default encoding used is utf-8 in my laptop, which leads to a failure in reading the file. The following modification is needed to use latin-1 encoding for reading. Hopefully, this will be helpful for others encountering this issue!

diff --git a/AnnotatorCore.py b/AnnotatorCore.py
index a448a7d..6e34f82 100644
--- a/AnnotatorCore.py
+++ b/AnnotatorCore.py
@@ -506,7 +506,7 @@ 
def processalterationevents(eventfile, outfile, previousoutfile, defaultCancerTy
     if os.path.isfile(previousoutfile):
         cacheannotated(previousoutfile, defaultCancerType, cancerTypeMap)
     outf = open(outfile, 'w+', 1000)
-    with open(eventfile, DEFAULT_READ_FILE_MODE) as infile:
+    with open(eventfile, DEFAULT_READ_FILE_MODE, encoding='latin-1') as infile:
         reader = csv.reader(infile, delimiter='\t')

         headers = readheaders(reader)
zhx828 commented 3 weeks ago

@valleyUp thanks! Do you think this is a general solution that can be used by anyone? If so, do you mind sending a pull request?