Open Julian-J-S opened 1 year ago
I'm not sure if this is easy to achieve.
When we evaluate multiple expressions, we should tranform Vec<Result<Series, Err>>
to Result<Vec<Series>, Err>
.
The final result depends on the way we evaluate it:
Err
is first triggered and return that error(Behavior of standard libraries). Err
, then all previous Ok
items collected are discarded, and it returns that error. If there are multiple errors, the one returned is not deterministic(Behavior of Rayon).I don't think we should do that. I haven't seen an interpreter that keeps running once an error is encountered.
This would also have non trivial complexity and assume we always can continue at error. That means we must change assumptions and internal state, making some optimizations impossible.
Maybe there is a small misunderstanding. Let me explain with an example:
df.with_columns(
# BLOCK A
calc1 = ...
calc2 = ...
).with_columns(
# BLOCK B
calc3 = calc1...
...
)
As far as I understand the calculations of "BLOCK A" are independet of each other and run in parallel/concurrent. So it should be possible to gather all results (successfuls and failures) of that block and display all errors that are present or otherwise continue?
I do NOT expect the interpreter to run "BLOCK B" after an error in "BLOCK A". I would like to see ALL errors happening in a block ("BLOCK A") and then stop (if any error present)
Here a common python programm to run tasks concurrently and return ALL results (errors + values)
import asyncio
async def foo():
raise ValueError("Foo ValueError!!")
async def bar():
return 5
async def baz():
raise NotImplementedError()
async def main():
results = await asyncio.gather(
foo(),
bar(),
baz(),
return_exceptions=True,
)
print(results)
# [ValueError('Foo ValueError!!'), 5, NotImplementedError()]
asyncio.run(main())
Problem description
Problem
Error messages from parallel executions are not combined but only the first error message is displayed.
Reason
Imagine an ETL workflow where a CSV has 10 column of which 5 have corrupt values
Current procedure:
Instead of:
Example
The individual error messages are:
str
toi64
failed for value(s) ["hello", "world"]str
tof64
failed for value(s) ["def", "abc"]str
todate
failed for value(s) ["2023-01-01", "2023-02-01"]But what the user sees is actually only the first:
str
toi64
failed for value(s) ["hello", "world"]Desired result
combination of error messages into one