semiodesk / limes

Project repository for the Expert Finder System of the Life and Medical Sciences (LIMES) Institute, Bonn University.
0 stars 0 forks source link

AttributeError: 'CsvReader' object has no attribute 'file' #33

Closed asuccurro closed 2 years ago

asuccurro commented 2 years ago

Error when running the cleaning scripts on the input csv files:

(base) succurro@antigone:~/repositories/publicgithub/limes/src/Utils$ ./processdata.sh 
 [!] koehrer@hhu.de: Phone prefix '221' does not match known prefixes for zip code '40225': '211'
 [!] schwender@math.uni-duesseldorf.de: Phone prefix '221' does not match known prefixes for zip code '40225': '211'
 [!] argyris.papantonis@uni-koeln.de: City name 'other' dos not match known name for zip code '37075': 'Göttingen'
Writing tmp/wggc2019_round1.clean.csv: 33 rows
Writing tmp/wggc2019_round2.clean.csv: 9 rows
Writing tmp/wggc2019.merge.csv: 42 rows
 [-] holger.schwender@hhu.de
 [-] wolfram.kunz@ukbonn.de
Writing tmp/wggc2019.filter.csv: 40 rows
Exception ignored in: <function CsvReader.__del__ at 0x7f9b75442510>
Traceback (most recent call last):
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 49, in __del__
    self.file.close()
AttributeError: 'CsvReader' object has no attribute 'file'
Exception ignored in: <function CsvReader.__del__ at 0x7f37a6e07510>
Traceback (most recent call last):
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 49, in __del__
    self.file.close()
AttributeError: 'CsvReader' object has no attribute 'file'
Traceback (most recent call last):
  File "csv-clean.py", line 16, in <module>
    reader = CsvReader(args.input, args.key)
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 41, in __init__
    self.file = open(filepath, encoding=encoding, newline=newline)
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/ngscn2021.merge.1.csv'
Exception ignored in: <function CsvReader.__del__ at 0x7fced88ec730>
Traceback (most recent call last):
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 49, in __del__
    self.file.close()
AttributeError: 'CsvReader' object has no attribute 'file'
Traceback (most recent call last):
  File "csv-filter.py", line 55, in <module>
    reader = CsvReader(args.input, args.key)
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 41, in __init__
    self.file = open(filepath, encoding=encoding, newline=newline)
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/ngscn2021.clean.1.csv'
Exception ignored in: <function CsvReader.__del__ at 0x7f5cc7127510>
Traceback (most recent call last):
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 49, in __del__
    self.file.close()
AttributeError: 'CsvReader' object has no attribute 'file'
Traceback (most recent call last):
  File "csv-convert.py", line 24, in <module>
    reader = CsvReader(args.input)
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 41, in __init__
    self.file = open(filepath, encoding=encoding, newline=newline)
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/ngscn2021.csv'
Exception ignored in: <function CsvReader.__del__ at 0x7fd4cbc98510>
Traceback (most recent call last):
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 49, in __del__
    self.file.close()
AttributeError: 'CsvReader' object has no attribute 'file'
Writing output/Person.csv: 0 rows
Writing output/PersonAffiliation.csv: 0 rows
Writing output/PersonFilterFlag.csv: 0 rows
Traceback (most recent call last):
  File "csv-keyphrases.py", line 13, in <module>
    reader = CsvReader(args.input)
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 41, in __init__
    self.file = open(filepath, encoding=encoding, newline=newline)
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/ngscn2021.csv'
Exception ignored in: <function CsvReader.__del__ at 0x7f7bcbe5a510>
Traceback (most recent call last):
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 49, in __del__
    self.file.close()
AttributeError: 'CsvReader' object has no attribute 'file'

Conda base environment (Python 3.7.3 (default, Mar 27 2019, 22:11:17) )

input files:

-rw-r--r-- 1 succurro succurro 14161 Nov 17 07:51 ngscn2021_full_form_20211117.csv
-rw-r--r-- 1 succurro succurro   928 Dez 13 10:43 ngscn2021_full_form_20211213.csv
-rw-r--r-- 1 succurro succurro  1924 Nov 17 07:33 ngscn2021_reduced_form_20211117.csv
-rw-r--r-- 1 succurro succurro   304 Dez 13 10:46 ngscn2021_reduced_form_20211213.csv
-rw-r--r-- 1 succurro succurro  1284 Nov 17 07:52 readme_inputfiles.md
-rw-r--r-- 1 succurro succurro   111 Nov 17 07:54 remove_from_webportal.txt
-rw-r--r-- 1 succurro succurro  1087 Nov 17 07:40 wggc2019_curated_profiles_emails.txt
-rw-r--r-- 1 succurro succurro 12168 Nov 17 07:41 wggc2019_round1.csv
-rw-r--r-- 1 succurro succurro  3183 Nov 17 07:41 wggc2019_round2.csv
-rw-r--r-- 1 succurro succurro   930 Nov 24 13:32 wggc2019_round3.csv
faubulous commented 2 years ago

Hi Antonella, can you add the contents of processdata.sh? This error is thrown when an input file cannot be opened.

asuccurro commented 2 years ago

It's the file from github (unmodified)

#! /bin/bash

# Cleanup the working directory and ensure that it exists..
rm -Rf tmp
mkdir tmp

# Clean the input data before merging..
python csv-clean.py \
    -i input/wggc2019_round1.csv \
    -o tmp/wggc2019_round1.clean.csv \
    -k confirm-or-change-email-address \
    -c profiles-rns \
    2> tmp/wggc2019_round1.dups.csv

python csv-clean.py \
    -i input/wggc2019_round2.csv \
    -o tmp/wggc2019_round2.clean.csv \
    -k confirm-or-change-email-address \
    -c profiles-rns \
    2> tmp/wggc2019_round2.dups.csv

python csv-clean.py \
    -i input/ngscn2021_full_form_20211117.csv \
    -o tmp/ngscn2021_full_form_20211117.clean.csv \
    -k confirm-or-change-email-address \
    -c profiles-rns \
    2> tmp/ngscn2021_full_form_20211117.dups.csv

python csv-clean.py \
    -i input/ngscn2021_reduced_form_20211117.csv \
    -o tmp/ngscn2021_reduced_form_20211117.clean.csv \
    -k confirm-or-change-email-address \
    2> tmp/ngscn2021_reduced_form_20211117.dups.csv

# Merge the input data files..
python csv-merge.py \
    -ia tmp/wggc2019_round1.clean.csv \
    -ib tmp/wggc2019_round2.clean.csv \
    -o tmp/wggc2019.merge.csv \
    -k confirm-or-change-email-address

python csv-filter.py \
    -i tmp/wggc2019.merge.csv \
    -o tmp/wggc2019.filter.csv \
    -f input/wggc2019_curated_profiles_emails.txt \
    -k confirm-or-change-email-address

python csv-merge.py \
    -ia tmp/ngscn2021_full_form_20211117.clean.csv \
    -ib tmp/ngscn2021_reduced_form_20211117.clean.csv \
    -o tmp/ngscn2021.merge.csv \
    -k confirm-or-change-email-address \
    -d ngs-cc-affiliation:'WGGC' \
    -d hide-contact-information:False \
    -m buidling:building \
    -m primary-affiliation:affiliation \
    -m specify-affiliation-1:specify-affiliation \
    -m your-role-s-at-the-wggc:your-role-s-at-the-ngs-cc \
    -m specify-your-role-s-at-the-wggc:specify-your-role-s \
    -m specify-city:city \
    -m you-agree-to-publish-your-data-on-the-ngs-cn-profiles-web-portal-if-affiliated-with-wggc-also-on-the-wggc-profiles-web-portal:you-agree-to-our-terms-and-policy

python csv-merge.py \
    -ia tmp/ngscn2021.merge.csv \
    -ib tmp/wggc2019.filter.csv \
    -o tmp/ngscn2021.merge.1.csv \
    -k confirm-or-change-email-address \
    -d ngs-cc-affiliation:'WGGC' \
    -d hide-contact-information:False \
    -m buidling:building \
    -m primary-affiliation:affiliation \
    -m specify-affiliation-1:specify-affiliation \
    -m your-role-s-at-the-wggc:your-role-s-at-the-ngs-cc \
    -m specify-your-role-s-at-the-wggc:specify-your-role-s \
    -m specify-city:city \
    -m you-agree-to-publish-your-data-on-the-ngs-cn-profiles-web-portal-if-affiliated-with-wggc-also-on-the-wggc-profiles-web-portal:you-agree-to-our-terms-and-policy

python csv-clean.py \
    -i tmp/ngscn2021.merge.1.csv \
    -o tmp/ngscn2021.clean.1.csv \
    -k confirm-or-change-email-address \
    -c profiles-rns

# Apply filters..
python csv-filter.py \
    -i tmp/ngscn2021.clean.1.csv \
    -o tmp/ngscn2021.csv \
    -k confirm-or-change-email-address \
    -f input/remove_from_webportal.txt \
    -e true \
    -v first-name: \
    -v last-name:

# Convert to ProfilesRNS data..
python csv-convert.py \
    -i tmp/ngscn2021.csv \
    -o output

python csv-keyphrases.py \
    -i tmp/ngscn2021.csv \
    -o output/keyphrases.txt \
    -k affiliation-strings-for-searches-in-pubmed
asuccurro commented 2 years ago

it breaks here:

(base) succurro@antigone:~/repositories/publicgithub/limes/src/Utils$ python csv-merge.py \
    -ia tmp/ngscn2021_full_form_20211117.clean.csv \
    -ib tmp/ngscn2021_reduced_form_20211117.clean.csv \
    -o tmp/ngscn2021.merge.csv \
    -k confirm-or-change-email-address \
    -d ngs-cc-affiliation:'WGGC' \
    -d hide-contact-information:False \
    -m buidling:building \
    -m primary-affiliation:affiliation \
    -m specify-affiliation-1:specify-affiliation \
    -m your-role-s-at-the-wggc:your-role-s-at-the-ngs-cc \
    -m specify-your-role-s-at-the-wggc:specify-your-role-s \
    -m specify-city:city \
    -m you-agree-to-publish-your-data-on-the-ngs-cn-profiles-web-portal-if-affiliated-with-wggc-also-on-the-wggc-profiles-web-portal:you-agree-to-our-terms-and-policy
Exception ignored in: <function CsvReader.__del__ at 0x7fdb3544e510>
Traceback (most recent call last):
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 49, in __del__
    self.file.close()
AttributeError: 'CsvReader' object has no attribute 'file'
(base) succurro@antigone:~/repositories/publicgithub/limes/src/Utils$ ll tmp/ngscn2021_full_form_20211117.clean.csv
ls: cannot access 'tmp/ngscn2021_full_form_20211117.clean.csv': No such file or directory
(base) succurro@antigone:~/repositories/publicgithub/limes/src/Utils$ ll tmp/ngscn2021_reduced_form_20211117.clean.csv
ls: cannot access 'tmp/ngscn2021_reduced_form_20211117.clean.csv': No such file or directory

Going back to the log when those files should have been produced:

(base) succurro@antigone:~/repositories/publicgithub/limes/src/Utils$ more tmp/ngscn2021_full_form_20211117.dups.csv
Traceback (most recent call last):
  File "csv-clean.py", line 16, in <module>
    reader = CsvReader(args.input, args.key)
  File "/home/succurro/repositories/publicgithub/limes/src/Utils/lib/csv_core.py", line 46, in __init__
    raise Exception(f'Key column {keyfield} not found in file {filepath}')
Exception: Key column confirm-or-change-email-address not found in file input/ngscn2021_full_form_20211117.csv
faubulous commented 2 years ago

Hi Antonella,

which Python version are you using?

asuccurro commented 2 years ago

Copied the wrong output, see edited error above, in the base env I run python 3.7.3

(base) succurro@antigone:~/repositories/publicgithub/limes/src/Utils$ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
faubulous commented 2 years ago

Would you be available for a quick call?

https://meet.google.com/opj-uyry-jen

asuccurro commented 2 years ago

Found it - some files have "confirm-or-change-email-address" and some "confirm-or-change-your-email-address", fixed with:

(base) succurro@antigone:~/repositories/publicgithub/limes/src/Utils/input$ sed -i 's/confirm-or-change-your-email-address/confirm-or-change-email-address/g' *.csv

Closing this issue

faubulous commented 2 years ago

Yes. Just noticed that the error must have something to do with the column name. Excellent. Good job! :)

faubulous commented 2 years ago

However, error handling could be improved. We should add a try/catch block around the main loop and pretty print the error so the actual message does not get lost in the bulk of stack traces.

Actually it would be quite nice to refactor bash script into a python script which can be step debugged..