Closed emmalexandria closed 8 months ago
Thanks for the PR.
I think in almost all cases the execution time should be limited. Maybe there is a good default that could be used in the parse function. Since you parsed a lot of wikimedia text, do you know the average execution time?
Then we could have two functions customizing this behaviour: parse_with_timeout
and parse_without_timeout
.
I think it might be useful to have the parsed nodes and warnings still available even if the timeout occurred.
Either there could be a property in crate::Output
something like errors
or the return type is like Result<Output, Error>
with the error being something like enum Error { TimedOut { execution_time: Duration, output: Output } }
.
Instead of u128
use std::duration::Duration
that seems more appropriate.
What do you think?
That all sounds good to me. I'll add some quick and dirty code to my Wikipedia dump parser to output an average parsing time to a CSV file.
I added those suggestions minus a few things. I'm not good enough with Rust to understand the lifetimes with Output
, so I didn't include it in the return value. If anyone knows the correct way to do so, then I'll definitely add it.
In addition, I'm yet to have a figure for the average time to parse an article, but I stuck with 5 seconds as a solid default. Once I have a good figure for it, I'll update the default.
Thanks
Solves #4.
I was thinking it's also possible to change the code of
configuration.rs
to the following:and then modify the code I put in
parse.rs
so that if themax_ms
parameter is 0, no limit is applied. That would mean that people who don't need to use the timeout code won't be affected if it's removed in future. I'm happy to implement that in this fork if it seems like a good idea.