Open benjelloun opened 1 month ago
We don't currently do it today, but this is related to a feature I'd love to see on Kaggle where datasets link to each other (or models) by showing which datasets are cleaned/etc. versions of others. Or which models were trained using which datasets. If we have a separate work stream to address this issue, please add me to any meetings/docs/etc.! I don't have the capacity to lead the effort right now, but very interested in participating.
Define a mechanism to describe lineage of data / provenance information.
This mechanism should support multiple levels of granularity:
We would ideally reuse existing vocabularies such as PROV-O.