Closed nielsklazenga closed 4 years ago
I am very much interested in an application of the standard that allows to exchange data in CSV files as we can with DwC. Data packages appear to be the strongest candidate for an existing standard in that area to jump on to. But if TCS NG becomes a RDF only standard I am frankly rather disappointed.
@mdoering I just gave Data Packages a look and it's really interesting. Here are a few relevant points:
Section 4 of the Standards Documentation Specification is very careful to say that the machine-readable metadata for a standard MAY be expressed as RDF, but that other methods can be used as long as the same relationships are expressed in a machine-processable way. So RDF is not required.
I believe strongly that we should try to keep our standards definitions simple enough that they can be expressed as CSV tables. At this point, all of the existing TDWG vocabulary standards ARE simple enough to be expressed as CSV tables. I don't think that people have paid much attention to the rs.tdwg.org repo but it contains all of the information required to describe TDWG vocabularies from CSV data and to turn those CSV data into machine-readable RDF. In each of the folder, there's one core CSV file (like this one for the dwc:
terms) and other files that describe how to map the table columns to well-known properties (like this one). I just took a look at the Table Schema information for Data Package and all of the information in the "other files" I just mention could be expressed as a Table Schema JSON file. So the Data Package system could be used to create CSV machine-readable files that are directly translatable to RDF and that would contain equivalent information.
Guid-O-Matic is the software that I wrote to turn CSV files into RDF serializations. I have been thinking of making a version 3.0 in Python, so maybe the Data Package specification would be the way to describe the CSV files. I see that they have a Python library, but didn't investigate what all it can do yet. One thing I don't know is how widely adopted Data Package is. Do you know?
I said earlier that all existing TDWG standards can be easily expressed in CSV tables. However, some of the models we are talking about so far in TNC are getting complicated to the point where that might be difficult. We should keep that in mind as we try to balance our desire to express complex ideas in the standard.
@mdoering, that is not at all the intention; we intent to make a specification that is broadly applicable, and not just because that's what the Vocabulary Maintenance Specification requires from TDWG standards. Your use case is definitely a very important one and is very much on our radar. If a lot of the examples were in Turtle that is just because it is easy to read and write and useful to quickly get an idea across. Speaking for myself, when I think about data models I think tables in a database and if I were to produce RDF with real data, it would be something else first (database tables) and there would be something else again (JSON) between the database table and the RDF..
The models we discussed so far may look complicated, but we really have been talking about only two classes/tables, so they aren't really (and I think everything you get into a database structure you can get into CSV). I am pretty sure I could shoehorn everything we discussed so far into the Darwin Core Taxon class if I had a good crack at it. I might just do that (not right now). The use I see for Label objects is in the interface between identifications and taxonomic names.
If I recall correctly, you were the one who suggested we should look at a domain model. We are definitely have a much closer look at serialization. I created this issue because I thought it would be good to have on the radar for the time when we really start looking at that (and because I had a look when you first mentioned it and it looked promising), but maybe it turns out timely to keep our eyes on the ball. Will try to use more different ways to show examples. To be fair, @baskaufs had CSV examples in the document in which he was spruiking the use of SKOS-XL.
Sorry @baskaufs, I had a better look at your post just now and see that you had already addressed pretty much everything that I just did.
If I provide you with a set of CSV files with data from a taxonomic revision, perhaps even in the form of a Data Package, would you be interested to do your Guid-O-Matic thing on it? I would be interested to see the result, both the CSV and the RDF.
Sure, I can give it a go. We just need to do some mapping of column headers to property URIs. The URIs can just be made-up; it doesn't matter if they are "real" or not.
Thanks @baskaufs. The time it took me to create my example for issue #30 made me realise it will take some time for me to get all the data together.
No problem. Just let me know...
I have added an example Data Package to the examples in this repository: /examples/datapackage.
@mdoering mentioned the use of Data Packages in issue #1, which is now closed.
Just elevating this to a separate issue, to keep it on the radar.
@mdoering, still interested in this?