rust-cli / rexpect

.github/workflows/ci.yml
https://docs.rs/rexpect
MIT License
328 stars 56 forks source link

draft of a possible solution to the #14 issue and changes in public interface #24

Closed zhiburt closed 4 years ago

zhiburt commented 4 years ago

Hi @philippkeller, In the #22 I put an example in what way may be changed the interface of ReadUntil. And today I went through the use cases of exp function and noticed that in all cases there's no need for all the data find function gives so I went slightly further and check what can be done.

The main idea behind this PR is to provide a cleaner calls of read_until function. The example when we match string and return the buffer before and the string itself but we know the string already since we did a match by that so the second part is always needless.

A bunch of examples

let mut p = spawn("cat", Some(1000)).expect("cannot run cat");
p.send_line("lorem ipsum dolor sit amet")?;
assert_eq!("lorem ipsum dolor sit ", p.exp_string("amet")?);
let f = io::Cursor::new("2014-03-15");
let mut r = NBReader::new(f, None);
let re = Regex::new(r"-\d{2}-").unwrap();
assert_eq!(
    ("2014".to_string(), "-03-".to_string()),
    r.read_until(&re).expect("regex doesn't match")
);
let f = io::Cursor::new("abcdef");
let mut r = NBReader::new(f, None);
assert_eq!(
    "ab".to_string(),
     r.read_until(&NBytes(2)).expect("2 bytes")
);

The resolution of #14 could looks like

we have an index of the ReadUntil element by which we get successful match, and the copy of the buffer on which it be called so if the user want he can repeat the match on the buffer.

let until = vec![ReadUntil::NBytes(30), ReadUntil::String("Hi".to_string())];
if let Ok(buffer, index)= = p.exp(until.as_slice()) {
   let res = &until[index]
      .clone()
      .string_needle()
      .unwrap()
      .find(&buffer, true)
      .unwrap();
}

This is a draft and it's affected by a linter which I am sorry about.

All tests are passed.

philippkeller commented 4 years ago

@zhiburt I'm currently swamped with work from my workplace and can only look at this at the weekend. Very happy that you took my PR and added your adaptions yourselves, looking forward to review this!

zhiburt commented 4 years ago

Perhaps, as a draft, it's not as clean as It should be.

The idea is basically that read_util function now only responsible for cleaning the internal buffer. The return value is determined by the Needle parameter. And ultimately it splits the ReadUntil to different types which implements Needle so each of them can have different return type.

/// read until it finds a match by needle
/// returns the needle's interest
pub fn read_until<N: Needle + ?Sized>(&mut self, needle: &N) -> Result<N::Interest>;

/// `Match` is a structure which contains a start and end of
/// a distance which `read_util` should delete from buffer.
pub struct Match<T> {
    begin: usize,
    end: usize,
    pub interest: T,
}

/// `Needle` trait has only one method `find` which return `Some`
/// if something was found `None` otherwise. 
pub trait Needle {
    type Interest;

    fn find(&self, buffer: &str, eof: bool) -> Option<Match<Self::Interest>>;
}

By the way the ReadUntil is still in need by us as we have to implement ReadUntil::Any with new interface.

It allows us to have different return values for each Needle.

impl Needle for String {
    /// return a content before the match
    type Interest = String;
    ...
}

impl Needle for NBytes {
    /// first N bytes
    type Interest = String;
    ...
}

impl Needle for Regex {
    /// content before the match and the match itself
    type Interest = (String, String);
    ...
}

impl Needle for [ReadUntil] {
    /// The copy of the buffer and an index of
    /// a first needle by which we get successful match
    type Interest = (String, usize);
    ...
}
philippkeller commented 4 years ago

@zhiburt I like the approach very much, your proposal regarding the API changes makes sense (e.g. matching a string only needs to return match before string) and then allows to return the index for Any. See my comments to your code above.

I also had a look at the API for pexpect and they only return the index of the match (which makes only sense if you provide a list of needles) or -1 if it didn't match. If you provide a regex then the matching string is stored in an attribute match within the "process object", so bottom line I think your proposal is better as it utilizes rusts features with "conditional return types".

Some questions:

  1. there's now two find functions which looks like redundant code
  2. you mention that this draft is still very drafty, what makes you think that?
  3. did you look into the merge conflict?

IMO this should first merged into a branch in order to more easily play around and be able add/change documentation and then I could release this together with the api changes for windows #11 in version 0.5

zhiburt commented 4 years ago
  1. I hope this one can be removed. https://github.com/philippkeller/rexpect/blob/0212cd6ff069e5633ea8e6d4dad8212f88ef6464/src/reader.rs#L86

  2. I consider it is drafty as the documentation is unchanged and all of a sudden a linter effected it. Also I dislike a bit the fact of existence 'util' methods in ReadUntil and the fact of coping the buffer in impl Needle for [ReadUntil] I consider that not a crucial performance issue but ...

  3. No, I didn't, I've considered to get a review first :)

I am not versed in git, but I bet have a different branch first not a bad idea

philippkeller commented 4 years ago

I created a new branch exp-and-windows where I'd like to merge this into. Can you resolve the conflict with master? I think this is easier for you than for me. Then I'll merge it into the new branch. The idea would be that also the windows changes are merged in this branch.

zhiburt commented 4 years ago

Good evening @philippkeller,

I've investigated the former mentioned issue with the buffer and not very expandable ReadUntil interface. And as I can state we could solve both issues at once by this proposal.

use reader::{ABInterest, NBytes, Regex, Until};

// currently it equals [ReadUntil::Nbytes(30), ReadUntil::String("Hi"), ReadUntil::Regex(Regex::("").unwrap())]
let until = Until(
    NBytes(30),
    Until("Hi".to_string(), Regex::new("").unwrap())
);
// we could wrap it by a function `or `
// NBytes(30).or("Hi".to_string())
//           .or(Regex::new("").unwrap())

match p.exp(&until) {
   Ok(ABInterest::A(s)) => {}
   Ok(ABInterest::B(ABInterest::A(s))) => {}
   Ok(ABInterest::B(ABInterest::B((s, regex)))) => {}
   Err(e) => assert!(false, format!("got error: {}", e)),
}

So user is building the any match type on its own. Until(a, b) is style of a OR b (This thoughts landed me to the question if there's any reason in a AND b?).

Why it's much more expandable? Because this PR provides an opportunity for user implement its own Needle, but the array implementation [ReadUntil] doesn't work with users implementations. With such implementation default Needles could be used with user defined ones in 'arrays'.

I don't think that user defined Needles is what is going to be used broadly but this is one option :)

Not sure if the names of type is expansionary enough :) And am certainly not sure if such interface is 'OK' but it reduce copying operations and force user to check all possible outcomes.

re implementation of [ReadUntil]

pub struct Until<A, B>(pub A, pub B);

pub enum ABInterest<A, B> {
    A(A),
    B(B),
}

impl<A: Needle, B: Needle> Needle for Until<A, B> {
    type Interest = ABInterest<A::Interest, B::Interest>;

    fn find(&self, buffer: &str, eof: bool) -> Option<Match<Self::Interest>> {
        self.0
            .find(buffer, eof)
            .and_then(
                |Match {
                     begin,
                     end,
                     interest,
                 }| Some(Match::new(begin, end, ABInterest::A(interest))),
            )
            .or_else(|| {
                self.1.find(buffer, eof).and_then(
                    |Match {
                         begin,
                         end,
                         interest,
                     }| {
                        Some(Match::new(begin, end, ABInterest::B(interest)))
                    },
                )
            })
    }
}
philippkeller commented 4 years ago

Aha, so a user could implement ABInterest his own, but we would of course implement the or type for him to provide an exp_any?

I was first fearing that this is bringing more complexity/abstractions into the code base. I lot sight a little bit the overview here :-) Is what you propose already possible with this PR or would it mean additional changes?

zhiburt commented 4 years ago

Aha, so a user could implement ABInterest his own, but we would of course implement the or type for him to provide an exp_any?

ABInterest is a return type for any matching (currently in the PR the type is (usize, String) index and copy of the buffer).

Is what you propose already possible with this PR or would it mean additional changes?

The read_until takes any type which implements Needle. If we make the trait pub the user will be able to pass its own types. (One more time I am not sure if it is a legit for the any user to have such option)

I was first fearing that this is bringing more complexity/abstractions into the code base.

I couldn't disagree, but from my perspective return a original Needle result is much more flexible rather than index + buffer. Here is my arguments.

The proposal is only linked to old ReadUntil::Any and current Needle for [ReadUntil] interfaces.

As I've mentioned it returns index + copy of the buffer. (I decided to return copy since I was not sure that buffer will be unchanged until the next call of a indexed object). I naturally think that the coping is only shows a smelt code here, but what is more importantly to get around rust type system there should be a util function to cast a ReadUntil obj to Needle. So the code is looks not really well.

let until = vec![ReadUntil::NBytes(30), ReadUntil::String("Hi".to_string()), Regex::new("").unwrap()];
if let Ok(buffer, index)= = p.exp(until.as_slice()) {
   let res = &until[index]
      .string_needle()
      .unwrap()
      .find(&buffer, true)
      .unwrap();
}

But there are a bit more here. As you can see I cast the indexed ReadUntil object to specific one. If the order of the array will be changed the type system will not be able to recognize the error. It will cause an unwap error. And more importantly we call find the second time. (first time in exp)

I propose to change the construction from [ReadUntil] to new type Until (the name is not descriptive ... ). Until is a tuple of Needles (Until(Needle, Needle)). The implementation will be changed to try to find the match by first element of the tuple if it's found than that is a return value, otherwise it check the second one. It will require to change the return type to ABInterest (the same expansionary name here). The ABInterest is present the result of the first element of the tuple or the result of the second element.

That literally a play around of rust type system to get able to remove ReadUntil's util functions. And as a step forward give the any logic be used with user defined Needles.

The code with the same logic as previous one.

// Until and ABInterest is renamed to UntilOR and ORInterest correspondingly (I guess if it is planer)
use reader::{ORInterest, NBytes, Regex, UntilOR};

let until = UntilOR(
    NBytes(30),
    UntilOR("Hi".to_string(), Regex::new("").unwrap())
);

match p.exp(&until) {
   Ok(ORInterest::Left(s)) => {}
   Ok(ORInterest::Right(ORInterest::Left(s))) => {}
   Ok(ORInterest::Right(ORInterest::Right((s, regex)))) => {}
   Err(e) => assert!(false, format!("got error: {}", e)),
}

If the order of parameters in UntilOR will be changed we will get an compiler error. And there's none of copying. And there's no second find call. That's why I consider it is a win.

But yep I am certainly not confident to say that it is a good way to follow as might it is slightly too complicated interface. Just wanted to have some comments according to it :)

philippkeller commented 4 years ago

Frankly I'm a bit lost, maybe because I'm not into rust that much as you are :-) But your arguments sound compelling. Less copying is certainly good and your ideas sound like the underlying code is very flexible.

What I'm not sure of is the changes it means to the api. So easiest for me would be if you could solve the conflicts and then I could merge it into the branch to play around, would that be possible?

zhiburt commented 4 years ago

@philippkeller I hope I resolved the conflicts. And I slightly clean the PR by removing a linter affected code. Also I did a couple of minor changes there.


The idea came to my mind today acording to my former proposal. We might are able to create a macros to hide the complexity of underling types in order to clean API. I've created a scratch of it but I am truly not versed in macros so I succeded only in creating a macros with constant number of parameters. But I guess the expandable macro can be implemented as well.

Example: the same any logic without macros and with.

match find {
    OrInterest::Lhs(OrInterest::Lhs(first)) => assert!(false),
    OrInterest::Lhs(OrInterest::Rhs(second)) => assert_eq!("Hello ".to_string(), second),
    OrInterest::Rhs(third)) => assert!(false),
}
until!(find, {
     first => { assert!(false) },
     second => { assert_eq!("Hello ".to_string(), second) },
     third => { assert!(false) },
});
philippkeller commented 4 years ago

@zhiburt so I merged your PR into the new branch and started to play around. This PR is already closed, so not sure where to put this discussion best, so I just put it here :-)

Early feedback: I find the exp_any interface too complicated right now. I was hoping that the Until and the OrInterest would be in the "inner workings" of the module, and would not be exposed in the api. What about exp_any would only accept "or" and has a vector of needles again as input parameter and gives back the index of which needle matched plus maybe string before and string matching?

Sorry for no more thorough feedback, I wanted to give an early feedback as soon as I had a time window :-) hope you don't take it personal

Additionally: did you see the windows branch? Maybe we can have some "general chat" on where this repo should be going, the way we work together, etc. Not sure what would be the best way to chat about this, maybe Telegram or Slack would serve this? But I'm not using in any code related communities so not sure what is the best option.

zhiburt commented 4 years ago

Early feedback: I find the exp_any interface too complicated right now.

As I said I totally agree with a such position :).

I was hoping that the Until and the OrInterest would be in the "inner workings" of the module, and would not be exposed in the api.

I too. As I currently understand that it can be hidden only by macros.

What about exp_any would only accept "or" and has a vector of needles again as input parameter and gives back the index of which needle matched plus maybe string before and string matching?

There's an issue with rust type system I guess here, so it is kind of problematic bu I'll check it.