Closed romicolman closed 3 months ago
Hello, @romicolman I can reproduce the issue more or less. This is what i get. The errors are the same in the report but I don't get those extra empty columns on the right like in your screenshot.
I know why the error is happening. Exactly this column is a decimal number separated with a comma. Since the csv by definition uses the comma to separate the different columns, Google sheets exports the comma separated values as a string (you can see in the csv that the numbers in this column are wrapped with ""
, example "3,04"
for the first column). Ideally we would have the data exported as value,value,3.04
with the decimal separated by a dot, but we can't control much how users set up formatting in google sheets so we have to find a solution to this exact issue.
Frictionless is detecting a field with this format "number,number"
as a geojson column type for some reason, when it should be detected as a string because the format in the csv is a string originally. Ideally this case should be detected and imported as a number... or even better, i think at some point the user should be asked what the schema type of the column is when importing these ambiguous types.
To make the error go away for now you should set the columns schema field value to string, (at the start it is set to geojson):
But even this is not exactly what we want because this is exported as a string by google sheets but we should have the option to read it as a number and currently there is also an error in ODE if you change the schema type to number. You found a very particular case, lol.
I don't think the report message is particularly helpful for finding out what the problem is. Probably many non tech users wouldn't be able to figure out why the column is creating an error so I think this is a good opportunity as well to suggest more detailed error messages for type errors.
To think of a solution to this issue we need to know why it was designed like this. Why is this specific format imported as geojson
? I think @roll can help us with that.
@pdelboca what are your suggestions for an issue like this?
Hello, @romicolman I can reproduce the issue more or less. This is what i get. The errors are the same in the report but I don't get those extra empty columns on the right like in your screenshot.
I know why the error is happening. Exactly this column is a decimal number separated with a comma. Since the csv by definition uses the comma to separate the different columns, Google sheets exports the comma separated values as a string (you can see in the csv that the numbers in this column are wrapped with
""
, example"3,04"
for the first column). Ideally we would have the data exported asvalue,value,3.04
with the decimal separated by a dot, but we can't control much how users set up formatting in google sheets so we have to find a solution to this exact issue.Frictionless is detecting a field with this format
"number,number"
as a geojson column type for some reason, when it should be detected as a string because the format in the csv is a string originally. Ideally this case should be detected and imported as a number... or even better, i think at some point the user should be asked what the schema type of the column is when importing these ambiguous types.To make the error go away for now you should set the columns schema field value to string, (at the start it is set to geojson):
But even this is not exactly what we want because this is exported as a string by google sheets but we should have the option to read it as a number and currently there is also an error in ODE if you change the schema type to number. You found a very particular case, lol.
I don't think the report message is particularly helpful for finding out what the problem is. Probably many non tech users wouldn't be able to figure out why the column is creating an error so I think this is a good opportunity as well to suggest more detailed error messages for type errors.
To think of a solution to this issue we need to know why it was designed like this. Why is this specific format imported as
geojson
? I think @roll can help us with that.@pdelboca what are your suggestions for an issue like this?
Hi @guergana! Let's wait for @roll 's comments, but in the meantime I think we need to I'll move this to sprint 6 so I can document it properly for the user guide if needed.
@romicolman As suspected the data package standard format for geopoint is "lat, long"
https://datapackage.org/standard/table-schema/#geopoint so this is not an issue per se with the app, we need to figure out how to make clear that there is this compatibility issue between google sheets when exporting csvs or find a way to mitigate it.
@romicolman I managed to see the problem exporting in .xls
now with the extra columns. I had not tested that part. My bad. 😬 Only the csv
part is finished.
I have discovered the issue in the .xls
file:
When you download the file, if you scroll to the right you can see there is some formatting underlined in some of the columns to the right (refer to the screenshot). It seems that to preserve that formatting, the .xls
is then exported with those fields marked as empty, so that you can see these lines if you open the document again. I don't believe this is a problem with the ODE, this is an issue with the exported file. In any case we could try to ignore all empty columns on importing to avoid cases like this, but I wouldn't see this is a priority for the release because the file already comes with this issue.
To fix the issue:
The empty columns should be gone, and also the errors when importing the file into ODE.
I see this more as something to make clear in the quality that is expected of the files provided by the users, something to be mentioned in the User Guide, than as a technical problem on our side.
Hi @guergana! I agree. This is something we would need to document properly for the user, but we won't work on this for the stable version. I'm moving this issue to the next sprint so to write about this for the user guide.
@guergana I will add a new section to the user guide called Key recommendations when uploading data to the ODE to incorporate this and other things users should keep in mind. Here is the suggested text:
_Before uploading a file to the ODE…
Make sure your file is well-organized: each column must contain a name, there should be no extra rows before the table, or additional elements (like borders or lines) next to the space where the tabular data is located, etc.
Check that the tabular data does not contain cells that are merged. Data producers from Governments, international organizations and internal reports usually add tiles, descriptions, graphs within sheets, like in this case. If there are extra elements in your file, the ODE will ingest your file and show you multiple errors.
If you export a file from Google Sheets in csv and you have columns with numbers, please, make sure to use “.” for decimals, instead of commas. Otherwise, Frictionless, the code working behind the ODE will interpret the content of your cells with numbers as text._
What do you think?
@guergana I will add a new section to the user guide called Key recommendations when uploading data to the ODE to incorporate this and other things users should keep in mind. Here is the suggested text:
_Before uploading a file to the ODE…
* Make sure your file is well-organized: each column must contain a name, there should be no extra rows before the table, or additional elements (like borders or lines) next to the space where the tabular data is located, etc. * Check that the tabular data does not contain cells that are merged. Data producers from Governments, international organizations and internal reports usually add tiles, descriptions, graphs within sheets, like in this [case](https://stat.link/vtenks). If there are extra elements in your file, the ODE will ingest your file and show you multiple errors. * If you export a file from Google Sheets in csv and you have columns with numbers, please, make sure to use “.” for decimals, instead of commas. Otherwise, Frictionless, the code working behind the ODE will interpret the content of your cells with numbers as text._
What do you think?
Hi, great work, I have some observations:
For the first point I would change a few words, though, instead of "well-organized" I would say "well-formatted", also I would remove the etc
at the end. Etc adds ambiguity and I don't think it's a good idea to have this work in guidelines or docs. This lets the user assume what the etc can be, let's instead provide a complete list of badly formatted cases.
In the second point, I don't understand what is meant by "ingest" in "the ODE will ingest your file". Do we have maybe another word for this?
Now what do you think about my observations @romicolman ? :stuck_out_tongue:
Perfect! I integrated your feedback, Guergana. Thanks a lot!
Problem description
If you import a dataset in xls in Google Sheets with any kind of problem that the ODE would detect as an error and then you download the dataset in csv to open it in the ODE, the app will detect different issues depending on the format.
Steps to reproduce it
Comments
When exporting the file in xls, the ODE will only detect blank labels for columns.
When exporting the file in csv, the ODE will inform the user about problems in the type of data in columns.
Is there any explanation for this or is it a bug?