Closed hicotton02 closed 11 months ago
Hi @hicotton02 !
Some amount of UnicdeDecodeError
are expected -- they get caught whenever a .tex file has characters which cannot be decoded using utf-8 characters. But in any case, let me know if the majority of documents can't get processed due to this error.
Regarding the stacktrace, can you show me the command and arguments you're using to run the script?
was able to get the content downloaded from S3 (shows 181GB) and attempted to run the ./run_clean.py script. I get thousands of errors like this one:
and then the stack trace:
I have attempted to re-download it once, but due to costs, dont want to try again without reaching out.