nickmckay / LiPD-utilities

Input/output and manipulation utilities for LiPD files in Matlab, R and Python
http://nickmckay.github.io/LiPD-utilities/
GNU General Public License v2.0
29 stars 9 forks source link

Python: MeasurementTable Number doesn't match between Excel and LiPD file #6

Closed khider closed 7 years ago

khider commented 7 years ago

Although the LiPD file returns the right number of tables, the number isn't matching the one in the Excel file.

Example attached

MD97-2121.Marr.2013.xlsx

LiPD file here: http://wiki.linked.earth/MD97-2121.Marr.2013 (sorry GitHub doesn't support .lpd format)

chrismheiser commented 7 years ago
screen shot 2017-05-22 at 9 34 10 pm

One issue with this excel file is there are two columns that are identically named. "G. ruber Mn/Ca" appears twice. It seems as one was a typo and was supposed to be Mg/Ca. This makes one column overwrite the other and causes a mismatch in how many columns try to write to csv.

This doesn't solve the problem of the tables writing to the wrong table name.

khider commented 7 years ago

I fixed that but it's still giving me the wrong table number for the Excel tab. MD97-2121.Marr.2013.xlsx

It's not the only file doing this.

chrismheiser commented 7 years ago

Right, I'm still looking into that.

khider commented 7 years ago

When two columns have the same name, neither one of them makes it on the wiki (that's what tells me something is wrong and it's my fault, although in this case it's the original file that had twice the same header). Not sure whether that's something we want to check in the utilities and warn people about.

chrismheiser commented 7 years ago

If there are two columns with the same name it will error and not write the CSV. I can have it print a warning to tell the user

khider commented 7 years ago

Outputting a warning of errors is always a good idea. If I know there is a problem with my file at the LiPD stage, I save time by not trying to validate it or putting it on the wiki.

khider commented 7 years ago

See this as another example: http://wiki.linked.earth/Kesang.Cheng.2012.ChronData1

chrismheiser commented 7 years ago

I believe the problem is, once again, unordered dictionaries. During readLipds we read lists of metadata, switch that over to dictionaries for all other functions to use, then switch back to lists during writeLipds(). Between steps 2 and 3, the ordering can change. I'm working on a fix.

chrismheiser commented 7 years ago

@khider Do you have the excel file for Kesang.Cheng.2012? I'm doing some testing.

khider commented 7 years ago

Kesang.Cheng.2012.xlsx

chrismheiser commented 7 years ago

Thanks! I'm fairly sure it's fixed according to some other files, but I'll check this and push an update if ready. I think this issue may have snuck through a lot of testing because many datasets only have one of each table.

khider commented 7 years ago

Thanks! I don't think I've broken anything else so far...