scrapinghub / scrapinghub-entrypoint-scrapy

Scrapy entrypoint for Scrapinghub job runner
BSD 3-Clause "New" or "Revised" License
25 stars 16 forks source link

Numbers in per-spider settings are handled as text #48

Closed shadow-ru closed 6 years ago

shadow-ru commented 6 years ago

When I assign value to variable in spider settings through WebGUI, it's handled as text. Scrapinghub's support was unable to understand that's wrong with it.

https://img-fotki.yandex.ru/get/900241/15906415.0/0_24369a_19270170_XXL https://img-fotki.yandex.ru/get/877150/15906415.0/0_24369b_e45b1136_XXL

shadow-ru commented 6 years ago

Well, it seems like scrapy's variables are handled correctly (Scrapy is parse it by itself?), but variables for third-party software components are not.

vshlapakov commented 6 years ago

Hello, it's true that all the custom settings passed via UI are handled as text, but it's definitely not an issue and done by design. The reason is that in general it doesn't make sense to guess a variable type because a component relying on the settings always knows what exactly to expect, thus it uses corresponding Scrapy Settings methods, like getint, getbool etc, check sources if you're interested. Your custom logic should follow the same principle.

After all it has nothing to do with current repository, so I'm closing this.

shadow-ru commented 6 years ago

Hi. It's up to you, but I should note that it only happens when I set up variables through WebGUI.

If I use variables that I had settled in code, everything works fine. However, things just stop working when I overwrite the SAME variables through WebGUI. I think it's just not-consistent. What's the point in web interface if you unable to quickly change spiders' behavior with it?

vshlapakov commented 6 years ago

There's a grain of truth here indeed, but I think you're not taking into account that settings.py is a pure Python file and all its content is interpreted by Python in runtime, while settings in Web UI are designed in a general-purpose style as we support not only Python in Scrapy Cloud, and not associated with any specific programming language syntax.

Technically I guess it's doable but it would mean that we should support full Python (and not only) syntax in web UI, including dicts, custom classes (so imports) and even branch constructions to provide full consistency to match users expectactions. It would also mean that we have to apply the settings only in runtime and intentionally limit our web UI abilities to validate provided settings before running a spider, even for most common settings.

So it's not really straightforward as it looks at first site, doesn't bring a lot of pros, and I do see a point why it was implemented in this way. Perhaps we could improve our documentation to make the above clear enough, I appreciate that you shed more light on it.