open-numbers / ddf--worldbank--povcalnet

Poverty Mountains Calculation
0 stars 0 forks source link

Frictionless package cannot understand windows folder delimiters on MacOS #2

Open larsyencken opened 2 years ago

larsyencken commented 2 years ago

Hey @semio o/

If you iterate over the package with frictionless on MacOS, with code like this:

import frictionless

p = frictionless.Package('datapackage.json')
for resource in p.resources:
    df = resource.to_pandas()

You get a result like this:

FrictionlessException: [scheme-error] The data source could not be successfully loaded: [Errno 2] No such file or directory: 'income_mountain\\ddf--datapoints--income_mountain_50bracket_shape_for_log--by--country--year.csv'

It looks like the windows path delimiter fails for unix systems. I don't have a windows machine, but if you have a few moments could you check whether using forward slashes (income_mountain/...) gets translated by Frictionless for windows?

If not, then it's probably a problem in the frictionless spec.

larsyencken commented 2 years ago

We can always work around it if need be just by replacing the delimiters as we go, so it's not a blocker on our side.

semio commented 2 years ago

Hi Lars, I was trying to use our tools which are supposed to be working cross platform in a Windows environment. And apparently they are not cross platform enough, thanks for reporting this issue :)

I did some googling and found this article about path in python: https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f

Python has a hack where it will recognize either kind of slash when you call open() on Windows... And Python’s support for mixing slash types is a Windows-only hack that doesn’t work in reverse. Using backslashes in code will totally fail on a Mac

So for me I think it's better to produce file paths in unix style. I will change it in our tools

Regarding frictionless library, I can't even do import frictionless under windows. I am not sure if it's problem in my environment setup or it's issue in frictionless, I will do some more experiment a bit later and keep you updated.

semio commented 2 years ago

The error when I do import frictionless. I don't have any setting files at all as this is the first time I install it

~\Anaconda3\envs\gapminder\lib\site-packages\frictionless\helpers.py in <module>
     20 from urllib.parse import urlparse, parse_qs
     21 from _thread import RLock  # type: ignore
---> 22 from . import settings
     23
     24

~\Anaconda3\envs\gapminder\lib\site-packages\frictionless\settings.py in <module>
     24 REPORT_PROFILE = json.loads(read_asset("profiles", "report.json"))
     25 STATUS_PROFILE = json.loads(read_asset("profiles", "status.json"))
---> 26 SCHEMA_PROFILE = json.loads(read_asset("profiles", "schema", "general.json"))
     27 RESOURCE_PROFILE = json.loads(read_asset("profiles", "resource", "general.json"))
     28 TABULAR_RESOURCE_PROFILE = json.loads(read_asset("profiles", "resource", "tabular.json"))

~\Anaconda3\envs\gapminder\lib\site-packages\frictionless\settings.py in read_asset(*paths)
     11     dirname = os.path.dirname(__file__)
     12     with open(os.path.join(dirname, "assets", *paths)) as file:
---> 13         return file.read().strip()
     14
     15

UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 7809: illegal multibyte sequence
larsyencken commented 2 years ago

Weird, I couldn't spot any reported bugs in their python package that match this.

If you wanted, we could add a github action that checks for each repo that the frictionless data reads smoothly.

semio commented 2 years ago

@larsyencken I figured it out. The "general.json" is included in frictionless package. In my Windows system, python try to read this file with wrong encoding.

According to python doc, the default encoding for open() is platform dependent. UTF-8 mode is the default on Linux but it's not on Windows. I need to add PYTHONUTF8 environment variable to ensure python use utf8 mode and finally I can import the package.

And I tried to load this datapackage with frictionless, it worked so frictionless does work with forward slashes on Windows.

larsyencken commented 2 years ago

Good find! That's definitely a frictionless bug though, they should explicitly pick UTF-8 I think.

semio commented 2 years ago

Yep, here it is: https://github.com/frictionlessdata/frictionless-py/issues/962