lisphilar / covid19-sir

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.
https://lisphilar.github.io/covid19-sir/
Apache License 2.0
110 stars 44 forks source link

DataLoader.japan() raises FileNotFoundError because blocksize information cannot be determined #264

Closed lisphilar closed 3 years ago

lisphilar commented 3 years ago

Summary

DataLoader.japan() raises FileNotFoundError because blocksize information cannot be determined.

(Optional) Related classes

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
japan_data = data_loader.japan()

This code raises `FileNotFoundError.

_____________ ERROR at setup of TestChangeFinder.test_find[Italy] ______________

    @pytest.fixture(autouse=True)
    def japan_data():
        data_loader = DataLoader("input")
>       return data_loader.japan()

tests/conftest.py:22: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
covsirphy/cleaning/dataloader.py:327: in japan
    df = self._japan_cases_get()
covsirphy/cleaning/dataloader.py:345: in _japan_cases_get
    return self._get_raw(url)
covsirphy/cleaning/dataloader.py:132: in _get_raw
    df = dd.read_csv(url).compute()
../.local/share/virtualenvs/covid19-sir-kT3BL-HO/lib/python3.8/site-packages/dask/dataframe/io/csv.py:645: in read
    return read_pandas(
../.local/share/virtualenvs/covid19-sir-kT3BL-HO/lib/python3.8/site-packages/dask/dataframe/io/csv.py:479: in read_pandas
    b_out = read_bytes(
../.local/share/virtualenvs/covid19-sir-kT3BL-HO/lib/python3.8/site-packages/dask/bytes/core.py:125: in read_bytes
    size = fs.info(path)["size"]
../.local/share/virtualenvs/covid19-sir-kT3BL-HO/lib/python3.8/site-packages/fsspec/asyn.py:121: in wrapper
    return maybe_sync(func, self, *args, **kwargs)
../.local/share/virtualenvs/covid19-sir-kT3BL-HO/lib/python3.8/site-packages/fsspec/asyn.py:100: in maybe_sync
    return sync(loop, func, *args, **kwargs)
../.local/share/virtualenvs/covid19-sir-kT3BL-HO/lib/python3.8/site-packages/fsspec/asyn.py:71: in sync
    raise exc.with_traceback(tb)
../.local/share/virtualenvs/covid19-sir-kT3BL-HO/lib/python3.8/site-packages/fsspec/asyn.py:55: in f
    result[0] = await future
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <fsspec.implementations.http.HTTPFileSystem object at 0x7fae18fefd30>
url = 'https://raw.githubusercontent.com/lisphilar/covid19-sir/master/data/japan/covid_jpn_total.csv'
kwargs = {}, size = False, policy = 'get'

    async def _info(self, url, **kwargs):
        """Get info of URL

        Tries to access location via HEAD, and then GET methods, but does
        not fetch the data.

        It is possible that the server does not supply any size information, in
        which case size will be given as None (and certain operations on the
        corresponding file will not work).
        """
        size = False
        for policy in ["head", "get"]:
            try:
                size = await _file_size(
                    url, size_policy=policy, session=self.session, **self.kwargs
                )
                if size:
                    break
            except Exception:
                pass
        else:
            # get failed, so conclude URL does not exist
            if size is False:
>               raise FileNotFoundError(url)
E               FileNotFoundError: https://raw.githubusercontent.com/lisphilar/covid19-sir/master/data/japan/covid_jpn_total.csv

../.local/share/virtualenvs/covid19-sir-kT3BL-HO/lib/python3.8/site-packages/fsspec/implementations/http.py:262: FileNotFoundError

Environment

lisphilar commented 3 years ago

Related information: https://github.com/dask/dask/issues/5222

df = dd.read_csv(url).compute()

Should be replaced with

df = dd.read_csv(url, blocksize=None).compute()
lisphilar commented 3 years ago

When Dask fails in reading the dataset, pandas will read. This will be fixed in version 2.9.1