in db, documents are duplicated

This doesn't happen in Ontohub. Let me explain the current behaviour:

Without Ontohub (no `--database-fileversion-id` parameter provided)

When you call Hets to analyse a file, it creates a new row in the file_versions table. The file_versions.id field of the new row is used to associate a Document with a FileVersion (that's a database constraint of Ontohub). Since the FileVersions of your consecutive calls to Hets differ, there are multiple documents with the same location/LocId. See this SQL output:

 ❯ psql -U postgres -d hets_development -c 'SELECT * FROM documents AS sub INNER JOIN loc_id_bases ON sub.id = loc_id_bases.id INNER JOIN file_versions ON loc_id_bases.file_version_id = file_versions.id;'
 id |     display_name     |          name          |       location       | version | id | file_version_id |  kind   |  loc_id  | id | action_id | repository_id |         path         |     commit_sha
----+----------------------+------------------------+----------------------+---------+----+-----------------+---------+----------+----+-----------+---------------+----------------------+---------------------
  1 | doc1                 | doc1                   | file:///tmp/doc1.dol |         |  1 |               1 | Library | doc1.dol |  1 |         1 |             1 | file:///tmp/doc1.dol | non-git FileVersion
  4 | file:///tmp/doc1.dol | <file:///tmp/doc1.dol> | file:///tmp/doc1.dol |         |  4 |               2 | Library | doc1.dol |  2 |         3 |             1 | file:///tmp/doc2.dol | non-git FileVersion
  7 | doc2                 | doc2                   | file:///tmp/doc2.dol |         |  7 |               2 | Library | doc2.dol |  2 |         3 |             1 | file:///tmp/doc2.dol | non-git FileVersion

The Document with documents.id = 1 has file_versions.id = 1 while the other two documents have file_versions.id = 2. Please note that the --database-reanalyze option has no effect if --database-fileversion-id is not set because Hets always creates a new FileVersion and there is no data that can be overwritten for that new FileVersion.

With Ontohub (`--database-fileversion-id=123456789` parameter provided)

Ontohub always tells Hets which FileVersion to use for the association with Documents and sub-Document models. If the same FileVersion is used for consecutive calls on the same Document, Hets fails unless the --database-reanalyze option is set. With this option, Hets deletes all data that is (recursively) associated with the FileVersion and saves the new analysis result for the given FileVersion.

spechub / Hets