Open songyuc opened 4 years ago
Hmm I think using python2.7 will solve this, or
try with io.open(file_path_dest,"r",encoding='ascii')
?
@da03 , oh, it worked! Thanks a lot!
Hi, @da03 , I want to confirm whether the processing in this repo is the same process in the paper, Image-to-Markup Generation with Coarse-to-Fine Attention?
Yes it's the same. You can also found processed data at http://lstm.seas.harvard.edu/latex/data/
Wow, it is great. I hope to follow your work to do some research.
And I guess, these two .gz
files are the same, am I right?
with io.open(file_path_dest,"r",encoding='ascii')
still not work at python3.7 env
before adjust
with open(temp_file, 'w') as fout:
prepre = open(output_file, 'r').read().replace('\r', ' ') # delete \r
# replace split, align with aligned
prepre = re.sub(r'\\begin{(split|align|alignedat|alignat|eqnarray)\*?}(.+?)\\end{\1\*?}',
r'\\begin{aligned}\2\\end{aligned}', prepre, flags=re.S)
prepre = re.sub(r'\\begin{(smallmatrix)\*?}(.+?)\\end{\1\*?}',
r'\\begin{matrix}\2\\end{matrix}', prepre, flags=re.S)
fout.write(prepre)
after adjust
with open(temp_file, 'w') as fout:
# prepre = open(output_file, 'r').read().replace('\r', ' ') # delete \r
prepre = io.open(output_file, 'r', encoding='ascii').read().replace(
'\r', ' ') # delete \r
# replace split, align with aligned
prepre = re.sub(r'\\begin{(split|align|alignedat|alignat|eqnarray)\*?}(.+?)\\end{\1\*?}',
r'\\begin{aligned}\2\\end{aligned}', prepre, flags=re.S)
prepre = re.sub(r'\\begin{(smallmatrix)\*?}(.+?)\\end{\1\*?}',
r'\\begin{matrix}\2\\end{matrix}', prepre, flags=re.S)
fout.write(prepre)
show error
2022-04-23 16:52:56,976 root INFO Script being executed: preprocess_formulas.py
2022-04-23 16:52:56,976 root INFO Script being executed: preprocess_formulas.py
Traceback (most recent call last):
File "preprocess_formulas.py", line 103, in <module>
main(sys.argv[1:])
File "preprocess_formulas.py", line 66, in main
prepre = io.open(output_file, 'r', encoding='ascii').read().replace(
File "/home/yhtao/anaconda3/envs/latex_ocr/lib/python3.7/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 854136: ordinal not in range(128)
@TITC this work for me io.open(output_file, 'r', encoding='latin-1')
Hi, guys, I am trying using the scripts in this repo to preprocess the im2latex dataset, but I met this error as,
So, how can I solve this? Any answer or idea will be appreciated!