Closed asuccurro closed 1 year ago
Suggested solution: I will try to merge the csv files and re-do the upload to avoid double entries
@Sebastian please check, I have merged the csv's but then the first appearing row is always kept, while in principle the ones added later are the up-to-date ones to keep, not sure where in the python code I can correct this
Example: see new input file ngscn2022_full_forms.csv, after processing the entry for Marc Beyer is the one with the typo in the name
(base) succurro@prometheus:~/profiles/limes/src/Utils$ grep -i beyer output/* output/Person.csv:-531241440;MArc;;Beyer;MArc Beyer;;
I see. Let me have a look.
Do you want the data to be merged and the later values overwrite the previous ones? Or do you just want to select all the data from the last duplicate row in the input file?
@asuccurro I changed the csv-clean.py so that it always outputs the last occurance of a row instead of the first. Is that OK?
Data processing behavior is now as expected, thanks!
Still to check - upload on the server
I get the same error message, now with "duplicate key value is (14)"
Found the issue: the Profiles RNS import routine does not clear the [Person].[Data].[FacultyRank] table. As a consequence, rows in the loaded dataset collide with existing ones. In this case, number 14 'Facility Leader'.
We need to discuss if changing this value in the import tool upon import by our Profiles RNS Manager tool would be an option. Also, when unchecking the 'Merge with existing data' box the import routine should clear this table to prevent such issues in future.
Fixed in 9eefc21
See attached screenshot of the error from Profiles RNS manager.
The new number of profiles reported (77) is the one that should have been after the update, the website still shows 66 so the update is probably "stuck" somewhere and we might have to revert something or overwrite all profiles and start from scratch