semiodesk / limes

Project repository for the Expert Finder System of the Life and Medical Sciences (LIMES) Institute, Bonn University.
0 stars 0 forks source link

Unique key error when uploading new profiles #38

Closed asuccurro closed 1 year ago

asuccurro commented 1 year ago

See attached screenshot of the error from Profiles RNS manager.

The new number of profiles reported (77) is the one that should have been after the update, the website still shows 66 so the update is probably "stuck" somewhere and we might have to revert something or overwrite all profiles and start from scratch

Selection_042

asuccurro commented 1 year ago

Suggested solution: I will try to merge the csv files and re-do the upload to avoid double entries

asuccurro commented 1 year ago

@Sebastian please check, I have merged the csv's but then the first appearing row is always kept, while in principle the ones added later are the up-to-date ones to keep, not sure where in the python code I can correct this

Example: see new input file ngscn2022_full_forms.csv, after processing the entry for Marc Beyer is the one with the typo in the name

(base) succurro@prometheus:~/profiles/limes/src/Utils$ grep -i beyer output/* output/Person.csv:-531241440;MArc;;Beyer;MArc Beyer;;

faubulous commented 1 year ago

I see. Let me have a look.

faubulous commented 1 year ago

Do you want the data to be merged and the later values overwrite the previous ones? Or do you just want to select all the data from the last duplicate row in the input file?

faubulous commented 1 year ago

@asuccurro I changed the csv-clean.py so that it always outputs the last occurance of a row instead of the first. Is that OK?

asuccurro commented 1 year ago

Data processing behavior is now as expected, thanks!

Still to check - upload on the server

asuccurro commented 1 year ago

I get the same error message, now with "duplicate key value is (14)"

faubulous commented 1 year ago

Found the issue: the Profiles RNS import routine does not clear the [Person].[Data].[FacultyRank] table. As a consequence, rows in the loaded dataset collide with existing ones. In this case, number 14 'Facility Leader'.

We need to discuss if changing this value in the import tool upon import by our Profiles RNS Manager tool would be an option. Also, when unchecking the 'Merge with existing data' box the import routine should clear this table to prevent such issues in future.

faubulous commented 1 year ago

Fixed in 9eefc21