langcog / metalab2

MetaLab -- Community-augmented meta-analysis
http://metalab.stanford.edu/
MIT License
21 stars 8 forks source link

Weird behavior when exporting data #60

Closed christinabergmann closed 6 years ago

christinabergmann commented 6 years ago

I've tried to export/import some MA data, but that doesn't work. There are a bunch of issues. (1) .csv when it's semicolon separated doesn't mesh with google drive. Not sure how to fix this.

Example case is inworddb. Original spreadsheet: https://docs.google.com/spreadsheets/d/1djn_-iE4uOs6yRmvBgzYAhB5guzfo3-3wnC7LJVLCe8/edit#gid=0 Reimported spreadsheet: https://docs.google.com/spreadsheets/d/1jYrUjYgsIHYIxnnZthwlCP-BHyddjzeNYgfQ2ar1W_g/edit#gid=1861011327 (Reimport: excel -> split column by ";", save as .xlsx, import to google drive, we need to document this pipeline if we cannot fix (1)).

(2) Numbers do not match. For example, the t-value in cell AA7 of the reimported spreadsheet is not a number (comma instead of dot) but it is correct in the source document cell BO7. Another example are the odd numbers in the derived values, like CW16, CY16, ... These numbers (over 100000000) can't be correct and are not matching the visualization on MetaLab. I don't know where this is coming from, but it looks like quite a serious issue.

christinabergmann commented 6 years ago

Short update: It looks like all effect sizes above 1/below -1 are affected. The problem persists when trying out different delimiters, it might be worthwhile moving to write.table?

Btw, writing is in line 150 of this script: https://github.com/langcog/metalab2/blob/master/scripts/cache_datasets.R write.csv(dataset_data, file.path("data", paste0(dataset_meta$filename, ".csv")), row.names = FALSE)

This issue might be an Excel problem, but then many might use Excel to inspect our data (and we recommend it for adding rows/columns). https://stackoverflow.com/questions/27539091/write-csv-changes-the-data-in-r

shotsuji commented 6 years ago

[copying in emails] Hi Christina,

I just downloaded the word segmentation DB as .csv (using the "Download data" button at the top right of visualization page), and for me opening it in excel did NOT cause the problems you describe. I use a US MS Office.

So it might be indeed a comma versus dot excel problem?

Sho

Thanks Sho, I also used the download button, then it might be a EU/US issue, but it persisted when using google sheets (and I could not open the csv in google sheets directly). I am just confused why it would affect values over 1/under -1 in the effect size columns, if it were a consistent comma/period issue I'd expect all values to be affected. (See example sheet I uploaded).

Does anyone have ideas how to fix this? Either by sharing data in a form that works with google and recommending that or by finding a "safe" export encoding?

shotsuji commented 6 years ago

Christina - just to be sure what you mean, so when you download the csv and directly import into google sheets (without doing the excel transform) the problem persists?

(Just want to try replicating this issue on my French Lenovo)

shotsuji commented 6 years ago

Oh I see you weren't able to open the csv in google sheets directly? For me that was no problem on my US computer - just to be sure we did the same thing: I did New --> File upload --> selected csv --> upload --> click on uploaded item and "open with google sheets" Upload was ok & I did not get the weird values you report; see here: https://docs.google.com/spreadsheets/d/12t2-Uo978HN80wzHnv5PvynNcL2kAvZeGdwOQCdQP2k/edit?usp=sharing

christinabergmann commented 6 years ago

Thanks, Sho, for putting it all in one place!

Indeed, my mistake, I played with separators and might have caused my earlier opening issues myself. Going directly to google sheets works, all looks good there. So we have to explicitly recommend not using non-US Excel versions until this is fixed.

On 5 April 2018 at 08:01, shotsuji notifications@github.com wrote:

Oh I see you weren't able to open the csv in google sheets directly? For me that was no problem - just to be sure we did the same thing: I did New --> File upload --> selected csv --> upload --> click on uploaded item and "open with google sheets" Upload was ok & I did not get the weird values you report; see here: https://docs.google.com/spreadsheets/d/12t2-Uo978HN80wzHnv5PvynNcL2kAvZeGd wOQCdQP2k/edit?usp=sharing

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/langcog/metalab2/issues/60#issuecomment-378828909, or mute the thread https://github.com/notifications/unsubscribe-auth/ALVmkw80vph9rWYndVErYjSzPSbAo5elks5tlbMvgaJpZM4S8jmx .

-- Researcher at the Max Planck Institute for Psycholinguistics, Language Development Department, Nijmegen, The Netherlands Website: sites.google.com/site/chbergma Blog: cogtales.wordpress.com

shotsuji commented 6 years ago

Hi, I had no problem viewing this with my French MS Office either. However, I noticed that I had already set the options in this system to use "." instead of "," as decimal separators. Better you google yourself how to do it under your specific excel version since it differs, but for me it is Fichier --> Options Excel --> Options avancees. Maybe you could check if that resolves your problem, and if yes, we should add that to the website?!

christinabergmann commented 6 years ago

Indeed, that makes the difference. We should add to the website that if you want to use excel etc you have to first make sure you set the decimal separator to "." Thanks!!!