Open BurnzZ opened 4 years ago
I agree with the need for simplicity/concision at the necessary pipes to extract the data in the provided example above. Given the current grammar, from the "logical" point of view makes sense to use the first
twice, but analyzing it from the user perspective, it could be weirdy, or at least, verbose.
Another thing that I found exploring this example is that the first
will fail if the re_search
returns None. Also, if we try to apply the float
to an empty value we will get an exception too because it expects an iterable. But I think that in these cases we can avoid breaking things evaluating the expressions inside a try/catch
(if an exception is thrown we return None), in the same way that the universal parser extractor does.
This test need not be merged into master but instead provides the necessary stimuli to see some pain points in
shublang
's usage.In particular we can see that:
The current
sanitize
functionality returns empty strings in its iterable. This presents the need to update it to prune out the empty strings, otherwise it would evaluate our test example as['', '', '', 'price: $123,823.00', '']
We need to do a double
first
, since the 1st one transforms[('123823.00',)]
into('123823.00',)
and the 2nd one transforms(123823.00,)
into123823.00
.The
float
functionality needs to be in between the doublefirst
since it only works on iterables.As we can see, we need to jump on a lot of hoops just to properly extract this type of data.
Ideally, we should have a way to extract the data in a very concise manner like:
re_search("(\d+\.\d{2}) | first_match