protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
64.53k stars 15.36k forks source link

Extend of repeated field does not check types when using compiled version #5299

Open Flamefire opened 5 years ago

Flamefire commented 5 years ago

What version of protobuf and what language are you using? Version: /v3.6.1 Language: Python

What operating system (Linux, Windows, ...) and version? Linux Mint

What runtime / compiler are you using (e.g., python version or gcc version) Python 3.6.0

What did you do?

  1. Have a scalar field of in32 type
  2. Call extend(np.array([True], dtype=np.bool)) on it

What did you expect to see Error about type mismatch (bool is not int32) consistently

What did you see instead? No error on Linux with pip-installed versions Error when using Git-Version (installed with pip -e) or on windows.

This came up during a PR for ONNX which uses protobuf (https://github.com/onnx/onnx/pull/1540). The same code causes a failure on Windows, but not on any Linux version tested. I checked out the Github version to be able to debug this and there it errored. Using VSCode I found that the extend method is a "native" method when it does not error and a python method when it does.

TLDR: Sometimes the python RepeatedScalarFieldContainer is used where the type is checked on append (https://github.com/protocolbuffers/protobuf/blob/63d2f3bc80a870a09c6aea74025df40aad7dba04/python/google/protobuf/internal/containers.py#L255) and sometimes the native class is used which does not have this type check: https://github.com/protocolbuffers/protobuf/blob/63d2f3bc80a870a09c6aea74025df40aad7dba04/python/google/protobuf/pyext/repeated_scalar_container.cc#L344

This also has the side effect, that the functions extend and insert are not provided in the native case.

Question: When/How is decided what to use?
Bug: Why is the type not checked in the native case and an int32 happily accepts a numpy bool?

anandolee commented 5 years ago

All numbers.Integral are accepted in python protobuf setters for int32 field: https://github.com/protocolbuffers/protobuf/blob/f50a1f843e8c2599a92e9d6bd5324462a5233367/python/google/protobuf/internal/type_checkers.py#L130

I am not sure why windows cause failures on windows, but I'd assume it was not from the set to a protobuf repeated field?

Flamefire commented 5 years ago

I am not sure why windows cause failures on windows, but I'd assume it was not from the set to a protobuf repeated field?

A numpy bool is not a numbers.Integral so the failure is actually the expected behaviour. The bug I'm reporting is that it does not fail in some circumstances (e.g. on everything but windows).

I traced this down to the fact that there are 2 different implementations of the RepeatedScalarContainer: 1 in C++ and 1 in Python. The latter does check the type and hence (correctly) fails, but the C++ version does not and hence is wrong.

To reproduce simply download protobuf with pip on linux and assign a numpy bool array to a int32 repeated scalar field. This should work although it is wrong.

anandolee commented 5 years ago

Yes we do have pure python and cpp extension. Looks like the issue is not for different platform, it is different protobuf implementations.

For cpp extension, we do have type checks as well: https://github.com/protocolbuffers/protobuf/blob/46a48e49aa8357bbeee8040819a35e59880e329a/python/google/protobuf/pyext/message.cc#L679

It was surprise me that a Python bools are a subclass of int but numpy bools are not subclass of number type.

My understand is that all types are acceptable if it can be safely convert to the expected type. For example a bool field can accept integral: https://github.com/protocolbuffers/protobuf/blob/46a48e49aa8357bbeee8040819a35e59880e329a/python/google/protobuf/internal/type_checkers.py#L239

Change the pure python to accept numpy bool make more sense to me

Flamefire commented 5 years ago

There was a discussion about this: https://github.com/numpy/numpy/issues/3580

Essentially:

Note that there is also an example posted: Isinstance(np.int8(0), numbers.Integral) -> False which does no longer hold (it returns True as it should)

Furthermore: a=np.bool_(5); b=a+0 -> a==True && b == 1

Given this I suggest to change the python checker to: isinstance(t, numbers.Integral) or isinstance(t+0, numbers.Integral) where an exception should be caught and reported as False. This would match the C code: Either it is an Integral or it can be cast/used as one.

github-actions[bot] commented 1 month ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

Flamefire commented 1 month ago

Has this been adressed/fixed?