scrapy / itemloaders

Library to populate items using XPath and CSS with a convenient API
BSD 3-Clause "New" or "Revised" License
44 stars 16 forks source link

Add support for set type #84

Open ava7 opened 4 months ago

ava7 commented 4 months ago

In the previous version (1.1.0), adding a set would return a list:

$ pip freeze | grep itemloaders
itemloaders==1.1.0

$ cat test_set.py 
from itemloaders import ItemLoader

test_set = {"foo", "bar"}

il = ItemLoader()
il.add_value("item_list", test_set)

print(il.load_item())

$ python ./test_set.py 
{'item_list': ['bar', 'foo']}

In version 1.2.0, a list o af set would be returned:

$ pip freeze | grep itemloaders
itemloaders==1.2.0

$ python ./test_set.py 
{'item_list': [{'bar', 'foo'}]}

This PR is aimed to keep the previous behaviour. Follow-up from https://github.com/scrapy/itemloaders/pull/51#issuecomment-2107173388

Gallaecio commented 4 months ago

I must say, when you said keeping set, I thought we were talking about changing the existing isinstance to include it, not to add a separate isinstance to convert it to a list. So I’m +0 on this, I’m not against the change but it feels weird that arg_to_iter converts a set to a list, feels like that should be done by the calling code, either before or after.

ava7 commented 4 months ago

@Gallaecio I honestly had the same initial idea, but then I double-checked how it used to work and started to doubt the whole idea. Ultimately though, I decided to make it backwards compatible and push it anyway, to have the conversation at least. Thinking about it again today, maybe it's better not to include the set at all as it might break existing code even more...

VMRuiz commented 4 months ago

Adding set to the list of classes in isinstance solves the same problem while keeping the code simple: https://github.com/scrapy/itemloaders/pull/85

I think it's good to keep backwards compatibility, otherwise, this could cause many problems. If someone wants to actually have a dict in that field they can use the Identity processor.

VMRuiz commented 4 months ago

@ava7 I didn't mind to steal your thunder here, I just created the PR to test my assumptions. Feel free to copy my solution here and close my PR.

ava7 commented 4 months ago

@VMRuiz It's alright mate, nothing to steal here, the more opinions the better :) The problem with that code is that it might not be backwards compatible against version 1.1.0, if that is what you meant. Weirdly enough, a set used to return a simple list, now a set would return a list of a set... I'm not sure which one is more confusing. Maybe the old behaviour should be treated as a bug, and proceed with your changes?

VMRuiz commented 4 months ago

I just checked that in 1.1.0 using default_output_processor or default_input_processor will not affect set being always converted into list, so it was impossible for an item to contain a set value.

Based on that, I propose to treat the old behavior as a bug and do not try to fix it. The downside to this is that someone that was relaying on the behavior may be surprised when he is not longer getting a list of values but a list of sets of values.