scrapinghub / shublang

Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
BSD 3-Clause "New" or "Revised" License
15 stars 8 forks source link

Evaluation should stop if None is return by any expression #62

Open VMRuiz opened 4 years ago

VMRuiz commented 4 years ago

The following expression and all the variation over it that I have tested always produced an 'NoneType' object is not subscriptable error in logs.

Expression: re_search("#([\d,]+)")|first|sub(",","")|int
Data: ['\n']

or

Expression: re_search("#([\d,]+)")|map(lambda x: x[0].replace(",", ""))|int
Data: ['\n']

They work with the following expected data:

Data = [
    "#667 in Books (",
    " #6 in",
    " #34 in",
    " #46 in",
]

Output =  [667,  6,  34,  46]

The reason is that at some point one of the method will return None or [None] and the next method will fail when attempting to apply to convert int to int or executing a replacement or search over.

I think is reasonable to expect shublang to stop processing data if None is reached at some point as we can't always assume the input data will be 100% accurate.

Alternatively, we could implement a stop-if-none method in case we want to manually set the breaking points.