wzh9969 / contrastive-htc

This repository implements a contrastive learning model for hierarchical text classification. This work has been accepted as the long paper "Incorporating Hierarchy into Text Encoder: a Contrastive Learning Approach for Hierarchical Text Classification" in ACL 2022.
MIT License
131 stars 30 forks source link

WOS - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 15: invalid start byte #16

Closed ayushxx7 closed 1 year ago

ayushxx7 commented 1 year ago

Steps

  1. Downloaded data from https://data.mendeley.com/datasets/9rw3vkcfy4/6 as mentioned in https://github.com/kk7nc/HDLTex#datasets-for-hdltex.
  2. Copied 'Data.xlsx' to 'Meta-Data/Data.txt' (just renamed the extension)
  3. Ran py preprocess_wos.py after installing necessary libraries, after which I got the following traceback
Traceback (most recent call last):
  File "/Users/ayush/workdir/personal/masters-research/contrastive-htc/data/WebOfScience/preprocess_wos.py", line 174, in <module>
    get_data_from_meta()
  File "/Users/ayush/workdir/personal/masters-research/contrastive-htc/data/WebOfScience/preprocess_wos.py", line 69, in get_data_from_meta
    origin_txt = f.readlines()
  File "/opt/homebrew/Cellar/python@3.10/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 15: invalid start byte
wzh9969 commented 1 year ago

You cannot rename .xlsx file directly. Open the file with Excel and Save As Data.txt.