scrapy / itemloaders

Library to populate items using XPath and CSS with a convenient API
BSD 3-Clause "New" or "Revised" License
44 stars 16 forks source link

if the value is 0 (int) it will be set to None #73

Open sla-te opened 1 year ago

sla-te commented 1 year ago

https://github.com/scrapy/itemloaders/blob/68e8701432bd8bebb990668a8938145477f60d37/itemloaders/__init__.py#L205C25-L205C25 will not set the value to 0 in case 0 is being passed, instead the value will be set to NoneType.

VMRuiz commented 1 year ago

Hello @sla-te ,

Can you please share an example for how to reproduce your issue?

There is already one test for that situation your mentioning and it seems to be adding 0 as expected.

sla-te commented 1 year ago

@VMRuiz Well, looking at this test it appears to be working, on the other hand any value, that is (int) 0, that lands at https://github.com/scrapy/itemloaders/blob/68e8701432bd8bebb990668a8938145477f60d37/itemloaders/__init__.py#L205C25-L205C25 will be checked only via if 0, and that would always be false resulting in the value being empty.

I will extract the scenario where it happens in my logic and update this post with it.

VMRuiz commented 1 year ago

Values are passed wrapped inside an iterator value = arg_to_iter(value) . So at the conditional the value is checked as if [0] which is actually true.

sla-te commented 1 year ago

@VMRuiz Does self._process_input_value(field_name, value) not call the user defined input processor? As far as I understand it after value = arg_to_iter(value), the value is passed to self._process_input_value(field_name, value) and if the input processor would now extract and modify the value which may be "0", casts it to int and returns it if processed_value would be False resulting in the value being None instead of 0 (int).

I fixed that by just changing if processed_value to if processed_value is not None .

The test passes due to the InputProcessor not being customized.

VMRuiz commented 1 year ago

Ok, is this what you mean?

>>> from itemloaders import ItemLoader
>>> class TakeFirstItemLoader(ItemLoader):
...     default_input_processor = TakeFirst()
... 
>>> l = TakeFirstItemLoader(item=dict())
>>> l.add_value("field", [0])
>>> l.load_item()
{}
>>> l.add_value("field", [1])
>>> l.load_item()
{'field': [1]}

In that case, yes. It looks like the check could be problematic. I'm not sure which solution is better:

        processed_value = self._process_input_value(field_name, value)
        if processed_value is not None:
            self._values.setdefault(field_name, [])
            self._values[field_name] += arg_to_iter(processed_value)

or

        processed_value = arg_to_iter(self._process_input_value(field_name, value))
        if processed_value:
            self._values.setdefault(field_name, [])
            self._values[field_name] += processed_value

First one will ignore None input values and the second one will accept any result.

cc. @wRAR

sla-te commented 1 year ago

Yes, pretty much any combination, that would involve unwrapping the the iterable inside the InputProcessor could yield 0 (int), a class, with boolness set to False or even a bool, that is False. All these would be set to None.

divtiply commented 10 months ago

@VMRuiz, consider this

        processed_value = self._process_input_value(field_name, value)
        values = [v for v in arg_to_iter(processed_value) if v is not None]
        if values:
            self._values.setdefault(field_name, [])
            self._values[field_name] += values