okfn / messytables

Tools for parsing messy tabular data. This is now superseded by https://github.com/frictionlessdata/tabulator-py
http://messytables.readthedocs.io/
387 stars 110 forks source link

TypeError("object of type 'float' has no len()",) when calling type_guess #163

Closed metaodi closed 7 years ago

metaodi commented 7 years ago

I could trace this back to #141 where len() is being used in the test() method of DateUtilType.

I think there should be a try/except block around that, that catches this TypeError. But I'm not too familiar with the code, so I'm basically asking if you agree, or if I'm missing something.

I'm happy to provide the PR.

BTW: I'm getting this error via datapusher on some Excel sheet that is being parsed with the default parameters. The excel sheet has indeed a lot of float values in it.

rufuspollock commented 7 years ago

I just got bitten by the same bug here. To add details:

This is happening when type guessing using the DataUtilType on an xls and is caused by this code https://github.com/okfn/messytables/blob/master/messytables/types.py#L188

    def test(self, value):
        if len(value) == 1:
             return False
        return CellType.test(self, value)

What seems to be happening is the value at this point is datetime.datetime (i assume because the xls lib has already parsed to datetime?)

Having looked at this I'm wondering if the better solution is to have a similar test to the DateType above it in the file:

         if isinstance(value, string_types) and not is_date(value):
metaodi commented 7 years ago

@rufuspollock looks good. Could you release this as 0.15.2?

rufuspollock commented 7 years ago

@metaodi i don't have push rights to pypi atm since i lost access to my account email (long story). Needs to be some other folks @okfn ...

metaodi commented 7 years ago

I created new a topic in discourse, let's see: https://discuss.okfn.org/t/create-new-release-of-messytables-on-pypi/4608

StevenMaude commented 7 years ago

@rufuspollock: I don't generally jump in here too much, e.g. on deciding whether PRs should be merged, because it's not really our project to decide, but I can release a new version via scraperwiki PyPI account, if everyone's happy with that.

I've created a PR for this. If you're fine with it, merge, tag it as 0.15.2 and I'll later release a version at that commit to PyPI.

rufuspollock commented 7 years ago

@StevenMaude thanks for the offer though I am certain other okfn folks have ability to publish here (I've almost never been the one who published this one).

@pwalsh @roll can you push this to pypi - with a bump in version?

pwalsh commented 7 years ago

@rufuspollock the scraper wiki team are the maintainers on this repo. No one currently at Open Knowledge has access to this on pypi. The most recent discussion on this with myself, @pudo and @StevenMaude confirmed this, as scraperwiki seem to be the main consumer of the package (as well as the CKAN codebase).

@StevenMaude as we discussed by mail a while back, you and your team have all needed rights on this repo. And, as the scraperwiki account has rights on pypi too, please feel free to go ahead and release a version.

For anyone else following, tabulator and even possibly goodtables, are in many ways the successors to messytables, with a sharper focus, and have been built for Python 3.x (with full Python 2.7.x support of course).

StevenMaude commented 7 years ago

@pwalsh Done, 0.15.2 on PyPI now.

Since our current work on databaker, which uses messytables, is soon coming to a close, we won't be actively working on messytables, but don't mind keeping half an eye on things to make sure that e.g. new releases get pushed out.

rufuspollock commented 7 years ago

@pwalsh - great and thanks for the clarification 😄 - was not aware of that change.

@StevenMaude great to have this live 👍

Tulsi97 commented 5 years ago

The problematic row in question has a NaN. remove NaN, it will work.