psu-libraries / library_data_services

MIT License
2 stars 1 forks source link

Interpretation of date formats varies in among people and applications. #67

Open olendorf opened 7 years ago

olendorf commented 7 years ago

Sherman Bernard recently found that the dates in his tabular data are have problems. When the data is initially entered, it is entered in one application the format DD/MM/YYYY (i.e. 10/04/2016). When they export the data from that application the format is retained. Problems start when it is then uploaded into Excel. By default, Excel interprets and formats the dates two ways. Sometimes its MM/dD/YY and it thinks its a date, or as DD/MM/YYYY and it is interpreting the data as a string.

Excel columns can be formatted to export the desired date format(s) but it doesn't appear to be able to upload dates correctly.

To make matters worse, some workers, while editing the data in excel have mis-entered the data in a variety of ways (YY/DD/MM, YY-mM-dD etc). Essentially the ambiguity of what is a month, day or year is causing data entry errors.

The result is when the data is then imported into other systems, such as the R statistical package or a relational database the data is both full of errors and often causes the the programs to be unable to read the data.

Other relevant personas

Almost all personas can potentially have this problem.