Add a sheet with the oldest manifestation per cluster to ease data cleaning - Githubissues

kbrbe / beltrans-data-integration

Creating a FAIR Linked Data corpus for the BELTRANS research project about Belgian book translations NL-FR and FR-NL between 1970 and 2020

https://www.kbr.be/en/projects/beltrans/

MIT License

5 stars 0 forks source link

Add a sheet with the oldest manifestation per cluster to ease data cleaning #242

Open SvenLieber opened 11 months ago

SvenLieber commented 11 months ago

Currently we have a sheet with translations (one row is one edition) and per translation the identifier of the cluster. For data cleaning and filling of gaps per cluster it is often more handy to only show the most relevant data for the project, which is the first edition.

We should add a new sheet with information per cluster (with linked information about the first edition)

[x] initial working SPARQL query to fetch this information (if necessary with possible postprocessing in Python)
[x] adapt pipeline to add the new CSV as a sheet to the corpus Excel
[ ] add more columns based on internal feedback