Closed zhiburt closed 4 years ago
@zhiburt I'm currently swamped with work from my workplace and can only look at this at the weekend. Very happy that you took my PR and added your adaptions yourselves, looking forward to review this!
Perhaps, as a draft, it's not as clean as It should be.
The idea is basically that read_util
function now only responsible for cleaning the internal buffer. The return value is determined by the Needle
parameter. And ultimately it splits the ReadUntil
to different types which implements Needle
so each of them can have different return type.
/// read until it finds a match by needle
/// returns the needle's interest
pub fn read_until<N: Needle + ?Sized>(&mut self, needle: &N) -> Result<N::Interest>;
/// `Match` is a structure which contains a start and end of
/// a distance which `read_util` should delete from buffer.
pub struct Match<T> {
begin: usize,
end: usize,
pub interest: T,
}
/// `Needle` trait has only one method `find` which return `Some`
/// if something was found `None` otherwise.
pub trait Needle {
type Interest;
fn find(&self, buffer: &str, eof: bool) -> Option<Match<Self::Interest>>;
}
By the way the ReadUntil
is still in need by us as we have to implement ReadUntil::Any
with new interface.
It allows us to have different return values for each Needle.
impl Needle for String {
/// return a content before the match
type Interest = String;
...
}
impl Needle for NBytes {
/// first N bytes
type Interest = String;
...
}
impl Needle for Regex {
/// content before the match and the match itself
type Interest = (String, String);
...
}
impl Needle for [ReadUntil] {
/// The copy of the buffer and an index of
/// a first needle by which we get successful match
type Interest = (String, usize);
...
}
@zhiburt I like the approach very much, your proposal regarding the API changes makes sense (e.g. matching a string only needs to return match before string) and then allows to return the index for Any. See my comments to your code above.
I also had a look at the API for pexpect and they only return the index of the match (which makes only sense if you provide a list of needles) or -1 if it didn't match. If you provide a regex then the matching string is stored in an attribute match
within the "process object", so bottom line I think your proposal is better as it utilizes rusts features with "conditional return types".
Some questions:
IMO this should first merged into a branch in order to more easily play around and be able add/change documentation and then I could release this together with the api changes for windows #11 in version 0.5
I hope this one can be removed. https://github.com/philippkeller/rexpect/blob/0212cd6ff069e5633ea8e6d4dad8212f88ef6464/src/reader.rs#L86
I consider it is drafty as the documentation is unchanged and all of a sudden a linter effected it.
Also I dislike a bit the fact of existence 'util' methods in ReadUntil
and the fact of coping the buffer in impl Needle for [ReadUntil]
I consider that not a crucial performance issue but ...
No, I didn't, I've considered to get a review first :)
I am not versed in git, but I bet have a different branch first not a bad idea
I created a new branch exp-and-windows
where I'd like to merge this into. Can you resolve the conflict with master? I think this is easier for you than for me. Then I'll merge it into the new branch. The idea would be that also the windows changes are merged in this branch.
Good evening @philippkeller,
I've investigated the former mentioned issue with the buffer and not very expandable ReadUntil
interface. And as I can state we could solve both issues at once by this proposal.
use reader::{ABInterest, NBytes, Regex, Until};
// currently it equals [ReadUntil::Nbytes(30), ReadUntil::String("Hi"), ReadUntil::Regex(Regex::("").unwrap())]
let until = Until(
NBytes(30),
Until("Hi".to_string(), Regex::new("").unwrap())
);
// we could wrap it by a function `or `
// NBytes(30).or("Hi".to_string())
// .or(Regex::new("").unwrap())
match p.exp(&until) {
Ok(ABInterest::A(s)) => {}
Ok(ABInterest::B(ABInterest::A(s))) => {}
Ok(ABInterest::B(ABInterest::B((s, regex)))) => {}
Err(e) => assert!(false, format!("got error: {}", e)),
}
So user is building the any
match type on its own. Until(a, b)
is style of a OR b
(This thoughts landed me to the question if there's any reason in a AND b
?).
Why it's much more expandable?
Because this PR provides an opportunity for user implement its own Needle
, but the array implementation [ReadUntil]
doesn't work with users implementations. With such implementation default Needle
s could be used with user defined ones in 'arrays'.
I don't think that user defined Needle
s is what is going to be used broadly but this is one option :)
Not sure if the names of type is expansionary enough :) And am certainly not sure if such interface is 'OK' but it reduce copying operations and force user to check all possible outcomes.
[ReadUntil]
pub struct Until<A, B>(pub A, pub B);
pub enum ABInterest<A, B> {
A(A),
B(B),
}
impl<A: Needle, B: Needle> Needle for Until<A, B> {
type Interest = ABInterest<A::Interest, B::Interest>;
fn find(&self, buffer: &str, eof: bool) -> Option<Match<Self::Interest>> {
self.0
.find(buffer, eof)
.and_then(
|Match {
begin,
end,
interest,
}| Some(Match::new(begin, end, ABInterest::A(interest))),
)
.or_else(|| {
self.1.find(buffer, eof).and_then(
|Match {
begin,
end,
interest,
}| {
Some(Match::new(begin, end, ABInterest::B(interest)))
},
)
})
}
}
Aha, so a user could implement ABInterest
his own, but we would of course implement the or
type for him to provide an exp_any
?
I was first fearing that this is bringing more complexity/abstractions into the code base. I lot sight a little bit the overview here :-) Is what you propose already possible with this PR or would it mean additional changes?
Aha, so a user could implement ABInterest his own, but we would of course implement the or type for him to provide an exp_any?
ABInterest
is a return type for any
matching (currently in the PR the type is (usize, String)
index and copy of the buffer).
Is what you propose already possible with this PR or would it mean additional changes?
The read_until
takes any type which implements Needle
. If we make the trait pub
the user will be able to pass its own types. (One more time I am not sure if it is a legit for the any user to have such option)
I was first fearing that this is bringing more complexity/abstractions into the code base.
I couldn't disagree, but from my perspective return a original Needle
result is much more flexible rather than index + buffer. Here is my arguments.
The proposal is only linked to old ReadUntil::Any
and current Needle for [ReadUntil]
interfaces.
As I've mentioned it returns index + copy of the buffer. (I decided to return copy since I was not sure that buffer will be unchanged until the next call of a indexed object).
I naturally think that the coping is only shows a smelt code here, but what is more importantly to get around rust type system there should be a util function to cast a ReadUntil
obj to Needle. So the code is looks not really well.
let until = vec![ReadUntil::NBytes(30), ReadUntil::String("Hi".to_string()), Regex::new("").unwrap()];
if let Ok(buffer, index)= = p.exp(until.as_slice()) {
let res = &until[index]
.string_needle()
.unwrap()
.find(&buffer, true)
.unwrap();
}
But there are a bit more here. As you can see I cast the indexed ReadUntil
object to specific one. If the order of the array will be changed the type system will not be able to recognize the error. It will cause an unwap
error. And more importantly we call find
the second time. (first time in exp
)
I propose to change the construction from [ReadUntil]
to new type Until
(the name is not descriptive ... ). Until
is a tuple of Needle
s (Until(Needle, Needle)
). The implementation will be changed to try to find the match by first element of the tuple if it's found than that is a return value, otherwise it check the second one. It will require to change the return type to ABInterest
(the same expansionary name here). The ABInterest
is present the result of the first element of the tuple or the result of the second element.
That literally a play around of rust type system to get able to remove ReadUntil
's util functions. And as a step forward give the any
logic be used with user defined Needle
s.
The code with the same logic as previous one.
// Until and ABInterest is renamed to UntilOR and ORInterest correspondingly (I guess if it is planer)
use reader::{ORInterest, NBytes, Regex, UntilOR};
let until = UntilOR(
NBytes(30),
UntilOR("Hi".to_string(), Regex::new("").unwrap())
);
match p.exp(&until) {
Ok(ORInterest::Left(s)) => {}
Ok(ORInterest::Right(ORInterest::Left(s))) => {}
Ok(ORInterest::Right(ORInterest::Right((s, regex)))) => {}
Err(e) => assert!(false, format!("got error: {}", e)),
}
If the order of parameters in UntilOR will be changed we will get an compiler error. And there's none of copying. And there's no second find
call. That's why I consider it is a win.
But yep I am certainly not confident to say that it is a good way to follow as might it is slightly too complicated interface. Just wanted to have some comments according to it :)
Frankly I'm a bit lost, maybe because I'm not into rust that much as you are :-) But your arguments sound compelling. Less copying is certainly good and your ideas sound like the underlying code is very flexible.
What I'm not sure of is the changes it means to the api. So easiest for me would be if you could solve the conflicts and then I could merge it into the branch to play around, would that be possible?
@philippkeller I hope I resolved the conflicts. And I slightly clean the PR by removing a linter affected code. Also I did a couple of minor changes there.
The idea came to my mind today acording to my former proposal. We might are able to create a macros to hide the complexity of underling types in order to clean API. I've created a scratch of it but I am truly not versed in macros so I succeded only in creating a macros with constant number of parameters. But I guess the expandable macro can be implemented as well.
Example: the same any
logic without macros and with.
match find {
OrInterest::Lhs(OrInterest::Lhs(first)) => assert!(false),
OrInterest::Lhs(OrInterest::Rhs(second)) => assert_eq!("Hello ".to_string(), second),
OrInterest::Rhs(third)) => assert!(false),
}
until!(find, {
first => { assert!(false) },
second => { assert_eq!("Hello ".to_string(), second) },
third => { assert!(false) },
});
@zhiburt so I merged your PR into the new branch and started to play around. This PR is already closed, so not sure where to put this discussion best, so I just put it here :-)
Early feedback: I find the exp_any interface too complicated right now. I was hoping that the Until
and the OrInterest
would be in the "inner workings" of the module, and would not be exposed in the api. What about exp_any would only accept "or" and has a vector of needles again as input parameter and gives back the index of which needle matched plus maybe string before and string matching?
Sorry for no more thorough feedback, I wanted to give an early feedback as soon as I had a time window :-) hope you don't take it personal
Additionally: did you see the windows branch? Maybe we can have some "general chat" on where this repo should be going, the way we work together, etc. Not sure what would be the best way to chat about this, maybe Telegram or Slack would serve this? But I'm not using in any code related communities so not sure what is the best option.
Early feedback: I find the exp_any interface too complicated right now.
As I said I totally agree with a such position :).
I was hoping that the Until and the OrInterest would be in the "inner workings" of the module, and would not be exposed in the api.
I too. As I currently understand that it can be hidden only by macros.
What about exp_any would only accept "or" and has a vector of needles again as input parameter and gives back the index of which needle matched plus maybe string before and string matching?
There's an issue with rust type system I guess here, so it is kind of problematic bu I'll check it.
Hi @philippkeller, In the #22 I put an example in what way may be changed the interface of
ReadUntil
. And today I went through the use cases ofexp
function and noticed that in all cases there's no need for all the datafind
function gives so I went slightly further and check what can be done.The main idea behind this PR is to provide a cleaner calls of
read_until
function. The example when we match string and return the buffer before and the string itself but we know the string already since we did a match by that so the second part is always needless.A bunch of examples
The resolution of #14 could looks like
we have an index of the ReadUntil element by which we get successful match, and the copy of the buffer on which it be called so if the user want he can repeat the match on the buffer.
This is a draft and it's affected by a linter which I am sorry about.
All tests are passed.