Pure RIS-to-CSV converter. This python script uses regular expressions to convert Research Information Systems (RIS) files to Comma Separated Values (CSV) files.
MIT License
15
stars
8
forks
source link
Fixes for UnicodeDecodeError and KeyError in RIS to CSV Conversion Script #5
First and foremost, I want to extend my sincere appreciation for creating this wonderful Python script for converting RIS files to CSV format. It's a fantastic tool that is very useful for handling academic references, and I can see the amount of effort that went into developing it. Your work has been incredibly helpful to me.
While using the script, I encountered a couple of issues that I wanted to bring to your attention and offer solutions for:
1. UnicodeDecodeError
When trying to read an RIS file, I encountered a UnicodeDecodeError caused by the cp949 codec (since I'm Korean), which couldn't decode certain bytes (specifically, byte 0xe2).
I resolved this by specifying the correct encoding (utf-8) when opening the RIS file. This change might help prevent errors on computers using languages other than English, as utf-8 is a universal encoding standard that supports a wide range of characters from different languages.
2. KeyError for Unrecognized RIS Tags
During the processing of the RIS file, a KeyError occurred due to an unrecognized tag ('NS') that was not present in the RIS_stds.csv mapping file. This caused the script to fail when it couldn't find a corresponding column number for this tag.
I added a simple check to ensure that only recognized tags are processed. Alternatively, you might consider adding the 'NS' tag and its corresponding column mapping directly into the RIS_stds.csv file to fully integrate this tag into the conversion process.
try:
row[column_num[ris_id]] = ris_data
except KeyError:
print(f"Warning: Unrecognized RIS tag '{ris_id}' found and skipped.")
Thank you again for this excellent tool. I hope these changes help enhance the robustness of the script and make it even more useful for others. Please feel free to review the changes and let me know if you have any questions or further suggestions.
Dear Taylor,
First and foremost, I want to extend my sincere appreciation for creating this wonderful Python script for converting RIS files to CSV format. It's a fantastic tool that is very useful for handling academic references, and I can see the amount of effort that went into developing it. Your work has been incredibly helpful to me.
While using the script, I encountered a couple of issues that I wanted to bring to your attention and offer solutions for:
1. UnicodeDecodeError When trying to read an RIS file, I encountered a
UnicodeDecodeError
caused by thecp949
codec (since I'm Korean), which couldn't decode certain bytes (specifically, byte0xe2
). I resolved this by specifying the correct encoding (utf-8
) when opening the RIS file. This change might help prevent errors on computers using languages other than English, asutf-8
is a universal encoding standard that supports a wide range of characters from different languages.2. KeyError for Unrecognized RIS Tags During the processing of the RIS file, a
KeyError
occurred due to an unrecognized tag ('NS'
) that was not present in theRIS_stds.csv
mapping file. This caused the script to fail when it couldn't find a corresponding column number for this tag. I added a simple check to ensure that only recognized tags are processed. Alternatively, you might consider adding the'NS'
tag and its corresponding column mapping directly into theRIS_stds.csv
file to fully integrate this tag into the conversion process.Thank you again for this excellent tool. I hope these changes help enhance the robustness of the script and make it even more useful for others. Please feel free to review the changes and let me know if you have any questions or further suggestions.
Looking forward to your feedback.
Best regards, Dongwoo