jerbarnes / semeval22_structured_sentiment

SemEval-2022 Shared Task 10: Structured Sentiment Analysis
75 stars 42 forks source link

Processing Darmstadt on OSX #2

Closed amith-ananthram closed 3 years ago

amith-ananthram commented 3 years ago

Hey there,

Thanks for the great repo! Just wanted to point out a little issue with processing the Darmstadt files on OSX. On OSX the sed command works a little differently so line 20 of process_darmstadt.sh should be:

grep -rl "&" universities/basedata | xargs sed -i '' -e 's/&/and/g'

Here's an explanation on StackOverflow: https://stackoverflow.com/questions/19456518/error-when-using-sed-with-find-command-on-os-x-invalid-command-code

Otherwise the script fails with the following error due to the rogue ampersands in the XML file:

...
  inflating: universities/customization/SentenceOpinionAnalysisResult_customization.xml  
sed: 1: "universities/basedata/U ...": invalid command code u
Traceback (most recent call last):
  File "/Users/amith/Documents/columbia/phd/sourceid/corpora/semeval22_structured_sentiment/data/darmstadt_unis/process_darmstadt.py", line 475, in <module>
    o = get_opinions(bfile, mfile)
  File "/Users/amith/Documents/columbia/phd/sourceid/corpora/semeval22_structured_sentiment/data/darmstadt_unis/process_darmstadt.py", line 113, in get_opinions
    text += token + " "
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Happy to open a PR with the change (tried pushing a branch but I think the repo is restricted).

jerbarnes commented 3 years ago

Hi,

Thanks for pointing out the code doesn't work on OSX! Feel free to open a pull request.

jerbarnes commented 3 years ago

Ok, in the end, I just added the code, so I'll close this issue now.

jerbarnes commented 3 years ago

In the end, the OSX solution didn't work with other systems. So I created a separate script for the OSX solution (process_darmstadt_OSX.sh).

amith-ananthram commented 3 years ago

Thanks Jeremy (sorry, missed your comment about opening a PR)!