Open tillmo opened 6 years ago
This doesn't happen in Ontohub. Let me explain the current behaviour:
--database-fileversion-id
parameter provided)When you call Hets to analyse a file, it creates a new row in the file_versions
table. The file_versions.id
field of the new row is used to associate a Document with a FileVersion (that's a database constraint of Ontohub). Since the FileVersions of your consecutive calls to Hets differ, there are multiple documents with the same location/LocId. See this SQL output:
❯ psql -U postgres -d hets_development -c 'SELECT * FROM documents AS sub INNER JOIN loc_id_bases ON sub.id = loc_id_bases.id INNER JOIN file_versions ON loc_id_bases.file_version_id = file_versions.id;'
id | display_name | name | location | version | id | file_version_id | kind | loc_id | id | action_id | repository_id | path | commit_sha
----+----------------------+------------------------+----------------------+---------+----+-----------------+---------+----------+----+-----------+---------------+----------------------+---------------------
1 | doc1 | doc1 | file:///tmp/doc1.dol | | 1 | 1 | Library | doc1.dol | 1 | 1 | 1 | file:///tmp/doc1.dol | non-git FileVersion
4 | file:///tmp/doc1.dol | <file:///tmp/doc1.dol> | file:///tmp/doc1.dol | | 4 | 2 | Library | doc1.dol | 2 | 3 | 1 | file:///tmp/doc2.dol | non-git FileVersion
7 | doc2 | doc2 | file:///tmp/doc2.dol | | 7 | 2 | Library | doc2.dol | 2 | 3 | 1 | file:///tmp/doc2.dol | non-git FileVersion
The Document with documents.id = 1
has file_versions.id = 1
while the other two documents have file_versions.id = 2
. Please note that the --database-reanalyze
option has no effect if --database-fileversion-id
is not set because Hets always creates a new FileVersion and there is no data that can be overwritten for that new FileVersion.
--database-fileversion-id=123456789
parameter provided)Ontohub always tells Hets which FileVersion to use for the association with Documents and sub-Document models. If the same FileVersion is used for consecutive calls on the same Document, Hets fails unless the --database-reanalyze
option is set. With this option, Hets deletes all data that is (recursively) associated with the FileVersion and saves the new analysis result for the given FileVersion.
If
file:///home/till/temp/doc1.dol
containsand
file:///home/till/temp/doc2.dol
containsand I analyse the two files with Hets in that order, then I get
Could this duplication be avoided by using
location
to identify documents, even ifname
anddisplay_name
differ?