huridocs / uwazi

Uwazi is a web-based, open-source solution for building and sharing document collections
http://www.uwazi.io
MIT License
233 stars 80 forks source link

Allow same items to nest under different groups in a list during thesauri CSV import #5407

Open pddocs opened 1 year ago

pddocs commented 1 year ago

Is your feature request related to a problem? Please describe. At the moment, the CSV import is not possible when a list has the same item in more than one group. However, such repeated terms in a list is expected because while the term itself is same, having it under different groups captures the necessary difference in nuances when documenting. For e.g., 'Extortion' under the group name 'Land' has a completely different nuance than 'Extortion' under another group called 'Kidnapping' in the same list.

Describe the solution you'd like To allow CSV import for list with groups nesting repetitive items under them.

Have you considered an alternative? To upload the CSV file without the repetitive terms. Then, to manually add the repeated ones once the list is inside UWAZI as a thesauri. Or, to edit the repetitive term with additional alphabet, so that the terms are no longer exact same across the groups during import, and then to delete these extra alphabet once the list is inside UWAZI as thesauri.

Additional context This limitation on having repetitive items in the same thesauri is only a concern during import. During manual entry inside UWAZI, one can add the same items across groups without any issue.

@hyebin-bina

fnocetti commented 9 months ago

Hi @pddocs @hyebin-bina , can you please clarify if this is during entity csv import or thesauri csv import? Thanks!

hyebin-bina commented 9 months ago

Hi @fnocetti, thank you for looking into this.

This happened during the entity CSV import. We also can confirm that this error does not happen during the thesauri CSV import.

I ran a test on this again and here is what I observed:

  1. You want to import an entity that has a Select property

  2. The Select property has two grouped thesauri that have identical thesaurus items under them as below

    Screen Shot 2023-12-06 at 1 13 47 PM
  3. You prepare your CSV file following the usual protocol - putting only the thesauri item values as below (this is what we know, is there another protocol that we do not know?)

    Screen Shot 2023-12-06 at 1 17 49 PM
  4. Your import works, but Uwazi randomly assigned the value to the first thesaurus as below

    Screen Shot 2023-12-06 at 1 22 10 PM
  5. This is an improved behavior as at the time we reported this, the import stops on step 4 and we could not import any entity.

It would be nice if there is a syntax for users to use in their CSV files so that we can indicate which thesauri a value belongs to, something like below.

Screen Shot 2023-12-06 at 1 28 47 PM

cc: @pddocs

pddocs commented 9 months ago

Confirming that the thesauri import issue has been resolved since this request was last created. Merci!

The issue that @hyebin-bina has now pointed with the ENTITY CSV import is a clear follow up to this. Thank you for noting her detailed comment.