teamdigitale / dati-semantic-backend

Backend for the NDC semantic repository
GNU Affero General Public License v3.0
4 stars 5 forks source link

[BE] Special use cases - Semantic Asset Harvesting #28

Open SrinivasanTarget opened 2 years ago

SrinivasanTarget commented 2 years ago

Acceptance Criteria: Story is aimed at handling better the special use cases during harvesting listed below: Case 1: - Handling of multiple CSV or ttl or both files present in a same semantic asset folder.

E.g: CV - Leaf node CV folder with lot of CSV’s & ttl files at same folder level- https://github.com/italia/daf-ontologie-vocabolari-controllati/tree/master/Vocabol[…]ollati/classifications-for-transparency/transparency-obligation If we have multiple ttl files and csv files then its hard to go by same filenames for both since we can have multiple abstractions (csvs) for one ttl file.

Ontology - Latest folder containing multiple ttl files at same folder - https://github.com/italia/daf-ontologie-vocabolari-controllati/tree/master/Ontologie/CLV/latest has main, align and DBGT ttl files.

Case 2: How do we handle deprecated folders present inside a repository? E.g: https://github.com/italia/daf-ontologie-vocabolari-controllati/tree/master/VocabolariControllati/vocs-deprecated has several deprecated CV's

ioggstream commented 2 years ago

Related issue https://github.com/teamdigitale/dati-semantic-backend/issues/41

ioggstream commented 6 months ago

This is currently addressed using skipwords (e.g., https://github.com/teamdigitale/dati-semantic-backend/blob/a8f666f60604b968bd94c946a43f3b21d6764557/docker-compose.yaml#L59C1-L60C1)

CV - Leaf node CV folder with lot of CSV’s & ttl files at same folder level-

In this case, this is probably a kludge, and a better solution should be found.

Ontology - Latest folder containing multiple ttl files at same folder - https://github.com/italia/daf-ontologie-vocabolari-controllati/tree/master/Ontologie/CLV/latest has main, align and DBGT ttl files.

Addressed via skipwords. We need to either standardize skipwords if we want to apply this pattern to all semantic repositories (e.g., not only to Ontopia) or to establish stricter requirements on repositories in the guidelines: these requirements could be relaxed later on.

Case 2: How do we handle deprecated folders present inside a repository? E.g: https://github.com/italia/daf-ontologie-vocabolari-controllati/tree/master/VocabolariControllati/vocs-deprecated has several deprecated CV's

Deprecated stuff can be moved in specific folders. The guidelines already address this via the assets/ folder. Moreover, skipwords can be used (e.g., deprecated-) to skip the folders.

I suggest to close and eventually re-open detailing every specific use case in a different issue.