scraperwiki / spreadsheet-download-tool

A ScraperWiki plugin for downloading data from a box as a CSV or Excel spreadsheet
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Columns arrive in random order #71

Closed frabcus closed 10 years ago

frabcus commented 10 years ago

The Twitter search tool uses OrderedDict to save its columns in the same order every time. This shows right in the "view in a table tool".

However, when downloading as .xlsx the order varies, I think.

e.g. This user's datasets https://scraperwiki.com/dataset/c7xsv3q

drj11 commented 10 years ago

I suspect that going via JSON is destroying the order. If one is careful, it is possible for JSON to preserve the order, but not if you just naively use a dict() to read it.

scraperdragon commented 10 years ago

This is a problem with the version of dataset we're using - it generated dicts. However, upgrading to a more recent version causes crashes due to datetimes not being roundtripped.

Blocked on https://github.com/pudo/dataset/issues/97 unless we change to not use dataset. Do we know why dataset was chosen? (It looked better than dumptruck.)

pwaller commented 10 years ago

@scraperdragon where is dataset being used? grepping the spreadsheet download tool I can't find it.

scraperdragon commented 10 years ago

Sorry: I think I was getting confused about what bug I was on.

I also can't reproduce this bug: the XLSX I downloaded has the columns in the same order as View in a Table.

pwaller commented 10 years ago

@frabcus Please close this bug unless you think it is an issue.

scraperdragon commented 10 years ago

Discussion with @frabcus suggests the problem is actually caused by https://github.com/scraperwiki/twitter-search-tool/issues/22 - i.e. column order is inconsistent because columns aren't added until needed. Either we totally rethink dumptruck or we fix it in twitter.

Closing since this isn't really a spreadsheet-download-tool problem.