inveniosoftware / react-invenio-deposit

React application for Invenio deposit forms.
https://react-invenio-deposit.readthedocs.io
MIT License
3 stars 45 forks source link

Affiliations: Not all entries are shown in the affiliations drop down #638

Open max-moser opened 1 year ago

max-moser commented 1 year ago

Describe the bug

Whenever two affiliations have the same name, only one of the two (or more) entries is shown.

Steps to Reproduce

  1. Create an InvenioRDM instance with all affiliations from the ROR dump
  2. Go to the upload form
  3. Add a new creator
  4. Enter "Water Research Institute" into the affiliations input
  5. Only "Water Research Institute (IRSA)" is displayed

Expected behavior

The drop-down list should suggest both "Water Research Institute (IRSA)" as well as "Water Research Institute (WRI)".

Screenshots (if applicable)

Here, you can see that the backend does give back the WRI entry, but it's ignored in the web UI: image

Actually, we can see that both "Water Research Institute" entries are returned: image

After renaming the IRSA entry to "Water Research Institute ITALIA", both entries are in the drop-down: image

Additional context

Once again, @ppanero was invaluable with helping me check this bug out and pointed to the key being set to the affiliation.name rather than the id: https://github.com/inveniosoftware/react-invenio-deposit/blob/master/src/lib/components/AffiliationsField.js#L23

max-moser commented 1 year ago

For extra context, here's a count of how many of the ROR entries from the InvenioRDM cookiecutter have duplicates:

In [10]: len({k: v for k, v in counter_names.items() if v > 1})
Out[10]: 717

In [11]: len({k: v for k, v in counter_names.items() if v > 2})
Out[11]: 189

In [12]: len({k: v for k, v in counter_names.items() if v > 3})
Out[12]: 85

In [13]: len({k: v for k, v in counter_names.items() if v > 4})
Out[13]: 53

In [14]: len({k: v for k, v in counter_names.items() if v > 5})
Out[14]: 31

In [15]: len({k: v for k, v in counter_names.items() if v > 6})
Out[15]: 24

In [16]: len({k: v for k, v in counter_names.items() if v > 7})
Out[16]: 17

In [17]: len({k: v for k, v in counter_names.items() if v > 8})
Out[17]: 14

In [18]: len({k: v for k, v in counter_names.items() if v > 9})
Out[18]: 12

In [19]: {k: v for k, v in counter_names.items() if v > 10}
Out[19]: 
{'Ministry of Health': 52,
 'Government Medical College': 13,
 "St. Luke's Hospital": 12,
 'Institute of Physics': 11,
 'Ministry of Justice': 17,
 'Ministry of Education': 20,
 'Ministry of Culture': 14,
 'Ministry of Agriculture': 11,
 'Ministry of Finance': 11,
 'Ministry of Foreign Affairs': 16}

In [20]: 
chriz-uniba commented 1 year ago

maybe when thinking about a solution it makes sense to also keep this issue in mind? https://github.com/inveniosoftware/invenio-app-rdm/issues/1868

might it be the case, that it makes sense, that in both cases a unique id would help to solve the problem?