usnationalarchives / digital-preservation

NARA digital preservation file format risk analysis and preservation plans
Other
197 stars 16 forks source link

Structured Query Language (SQL) #11

Closed edsu closed 4 years ago

edsu commented 4 years ago

Structured Query Language (SQL) is listed in the matrix as a low-risk format, but it does not appear to be listed as a file format in the Preservation Action Plan for Structured Data/Database Records. Many Relational Database Management Systems (RDBMS) allow the export of data as SQL, often for backup purposes.

While there are vendor specific extensions to SQL it seems that SQL should be mentioned as a data interchange and preservation format for relational databases. It is easier to deal with than binary formats such as MySQL's frm, myd, myi and ibd files, which are not always backwards compatible (and are mentioned in the Database Action Plan). Also, I imagine that NARA must have accessioned quite a few .sql files already?

One advantage that SQL has over CSV is that key relations are preserved. These are very important for trying to piece together later how the various tables are connected, and queried.

lljohnston commented 4 years ago

So far we've actually had very few records transferred to us as SQL, but that could change. SQL ended up in the Software Plan, where it is identified as a very low-risk format. It should have been cross-referenced in the Databases Plan but that somehow got overlooked. I'll make sure that it is linked to both record types.

edsu commented 4 years ago

After I left my comment I saw in the spreadsheet that in fact there had been very few SQL transfers. That is interesting in itself. I think mentioning that SQL is a viable way of transferring databases would be a good addition to the Database Plan, so thank you for that!