Open Rolv-Apneseth opened 1 month ago
Comparing Rolv-Apneseth:refactor-v3
(c7d342c) with v3
(a7247e4)
✅ 6
untouched benchmarks
Hi @Rolv-Apneseth! I have quickly looked at your PR, it looks very good! I will give it a better look later, but let me answer your first questions.
One thing for sure is the
String
values inapi/check.rs
- which ones exactly would you like me to make&str
orCow<'_, str>
?
Well, I think &str
should be enough, because we will never need to mutate (i.e., overwrite) the text strings, so keeping a reference should be fine. However, If I remember well, Clap makes this hard as you don't get reference to string read from the terminal, but owned string. So parsing text (either pure text or data annotation) from the terminal will required to be able to pass owned data, I fear. Using borrowed data is only useful for when reading files, as we may want to avoid allocating the full content of the files, and read by chunks instead.
So in conclusion, Cow<'_, str
> may be the only solution (apart from String
).
Also, for
cli/mod.rs
, I could potentially remove the need for the manual call ofcmd.execute
for each subcommand using the enum_dispatch crate if you're OK with adding a dependency? Fine as it is anyway though in my opinion, as there aren't many subcommands.
We can give it a try :-)
Dependencies should be fine, especially as we can keep it under the cli
feature.
I have quickly looked at your PR, it looks very good! I
Great!
Well, I think
&str
should be enough, because we will never need to mutate (i.e., overwrite) the text strings, so keeping a reference should be fine. However, If I remember well, Clap makes this hard as you don't get reference to string read from the terminal, but owned string. So parsing text (either pure text or data annotation) from the terminal will required to be able to pass owned data, I fear. Using borrowed data is only useful for when reading files, as we may want to avoid allocating the full content of the files, and read by chunks instead.
Sorry, I meant which things in check.rs
you actually wanted converted, but from this I'm assuming it's the Request.text
? But I am confused how this would be achieved. Do we not need to allocate a String
no matter what the source is? As for reading the file by chunks, do you want me to just make this use a BufReader
and create and await requests one at a time instead of the way it is currently?
let text = std::fs::read_to_string(filename)?;
let requests = request
.clone()
.with_text(text.clone())
.split(self.max_length, self.split_pattern.as_str());
let response = server_client.check_multiple_and_join(requests).await?;
And for the text.clone()
caused by server_client.check_multiple_and_join
, I could make that function just return a ResponseWithContext
instead of having it convert to Response
automatically like it's doing, so we can pass the owned string in and get it back out of the response.
We can give it a try :-)
I've added it in there - I think it fits nicely.
I could make that function just return a
ResponseWithContext
instead of having it convert toResponse
automatically like it's doing, so we can pass the owned string in and get it back out of the response.
I've pushed some changes there for this, avoiding the need to clone the input text.
Sorry, I meant which things in
check.rs
you actually wanted converted, but from this I'm assuming it's theRequest.text
?
Yes, Request.text
and Request.data
's inner fields. Mainly because one could create a vector of data annotations from a stream of Markdown tokens, which are usually &str
of the initial source.
But I am confused how this would be achieved.
I think using Cow<'_, str>
should make the trick.
Do we not need to allocate a
String
no matter what the source is?
Yes and no. Yes because reqwest
will have to allocate a string for the HTML request, but no because we don't actually need the user to provide us an owned String
. We don't care because reqwest
will allocate anyway. So it's good to provide flexibility.
As for reading the file by chunks, do you want me to just make this use a
BufReader
and create and await requests one at a time instead of the way it is currently?let text = std::fs::read_to_string(filename)?; let requests = request .clone() .with_text(text.clone()) .split(self.max_length, self.split_pattern.as_str()); let response = server_client.check_multiple_and_join(requests).await?;
Yes, this would be great. The split-text logic isn't really trivial to solve, because we want to avoid splitting text where it would break the meaning of a sentence, but also we cannot have long-running text either. I think working on a better split-logic can be a next step. For the moment, we can just use BufReader
and read the whole file, and feed the server with multiple requests (if text is too large). We could also read the file by chunks, but I think this is a fair assumption to assume that any file we want to check should fit completely in the memory.
And for the
text.clone()
caused byserver_client.check_multiple_and_join
, I could make that function just return aResponseWithContext
instead of having it convert toResponse
automatically like it's doing, so we can pass the owned string in and get it back out of the response.
Yes, the end-goal would be to remove the complex logic from Server
and perform multiple requests merging outside, so the server only has one method per HTTP request, nothing more.
Hey again. Had a lot of trouble with lifetimes and I could only get 'static
to work - not sure I love the whole solution, but if you havea look at the latest commit, is that what you had in mind?
Hello @Rolv-Apneseth, rapidly looking at your commit, I think you just forgot to add the lifetime parameter in Request -> Request<'source>
(I would prefer using 'source
over anonymous lifetime names like 'a
). This should explain why you could only use 'static
.
However, it makes little sense to have reference in responses, because reqwest::RequestBuilder::send
returns an owned Response
. We can keep owned String
in that case :-)
This will replicate a similar behavior to that of reqwest
. E.g., the builder takes a reference to be constructed pub fn json<T: Serialize + ?Sized>(self, json: &T) -> RequestBuilder
, but the response returns an owned struct: pub async fn json<T: DeserializeOwned>(self) -> Result<T>
.
I can have a go again but something was complaining about needing to live longer than 'static
. As for Response
, that makes sense.
I can have a go again but something was complaining about needing to live longer than
'static
. As forResponse
, that makes sense.
No issue. If you can't make it compile, just push with 'source
, and I will try to fix it myself :-)
So this code:
#[cfg_attr(feature = "cli", derive(Args))]
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Hash)]
#[serde(rename_all = "camelCase")]
#[non_exhaustive]
pub struct Request<'source> {
/// The text to be checked. This or 'data' is required.
#[cfg_attr(
feature = "cli",
clap(short = 't', long, conflicts_with = "data", allow_hyphen_values(true))
)]
#[serde(skip_serializing_if = "Option::is_none")]
pub text: Option<Cow<'source, str>>,
Leads to this error:
error: lifetime may not live long enough
--> src/api/check.rs:398:15
|
391 | pub struct Request<'source> {
| ------- lifetime `'source` defined here
...
398 | pub text: Option<Cow<'source, str>>,
| ^^^^^^ requires that `'source` must outlive `'static`
error: lifetime may not live long enough
--> src/api/check.rs:392:5
|
391 | pub struct Request<'source> {
| ------- lifetime `'source` defined here
392 | /// The text to be checked. This or 'data' is required.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'source` must outlive `'static`
|
= note: this error originates in the macro `clap::value_parser` (in Nightly builds, run with -Z macro-backtrace for more info)
So I don't think it will work as it is. Should I take out clap
from here and have a separate struct for the request arguments?
^ requires that
'source
must outlive'static
Oh yeah, I remember why I struggled with &str
back then ^^'
Creating a separate struct would work, but that means duplicating a lot of the code. I wonder if there is still a way to allow 'source
to take 'static
in the case of App::parse
, but not in general... This isn't as easy as I imagined
I wonder if there is still a way to allow
'source
to take'static
in the case ofApp::parse
, but not in general
Not that I can think of anyway. I'd say the options are to either stick with String
, use 'static
, or a different struct. For the different struct, could an impl From
keep fields in sync at least?
I wonder if there is still a way to allow
'source
to take'static
in the case ofApp::parse
, but not in generalNot that I can think of anyway. I'd say the options are to either stick with
String
, use'static
, or a different struct. For the different struct, could animpl From
keep fields in sync at least?
Ok so some news: at the moment, it seems impossible to do that in one struct, see https://github.com/clap-rs/clap/issues/5773 and the related discussion, so I suggest we duplicated the structure: to original Request
, with Cow<'source, str>
, and the clap-compatible RequestCli
, with String
. Maybe we can simply avoid duplicating code with a macro rule. If you don't see how to do that, I think I can put have some time during the weekend for this.
I think it would have to be a procedural macro right? Can't see how to do that with a declarative one.
I have a bit of time today so I'll have a look but feel free to do it if you have a good idea of how it should be done
I actually started some work on your branch today, I will hopefully have something working this afternoon :)
Oh nice, good luck
Ok so it compiles with --no-default-features
(without CLI), but I fear it isn't possible to conditionally compile the #[deriver(Parser)]
based on lifetime $lt == 'static
.
I am looking a possible crates that might provide useful tools to avoid rewriting most of the code.
Unfortunately, I haven't been able to put more work on this PR this weekend, sorry for that.
I fear more and more that the copy/paste option would be the only way of solving this.
Hey - been busy moving this weekend. If I have any time in the coming days I can have a look for any crates that could solve this problem - if not, yes, copy paste might have to do for now
Hey - been busy moving this weekend. If I have any time in the coming days I can have a look for any crates that could solve this problem - if not, yes, copy paste might have to do for now
No issue! Yeah, maybe let's copy and paste (keeping the 'source
lifetime for the api/click.rs
and use String
in cli/checks.rs
). The "easiest" solution would be to fix the upstream issue, i.e., do a PR to Clap's repository, but that I will only look at after this month,
Yeah couldn't find any crates for this situation.
I do think just copying the struct is better but to provide another option, what do you think of something like this:
pub struct Request<'source> {
// HACK: keep compiler happy when enabling the cli feature flag
#[cfg(feature = "cli")]
#[clap(skip)]
_lifetime: PhantomData<&'source ()>,
/// (...docs)
#[cfg(feature = "cli")]
#[clap(short = 't', long, conflicts_with = "data", allow_hyphen_values(true))]
#[serde(skip_serializing_if = "Option::is_none")]
pub text: Option<Cow<'static, str>>,
/// (...docs)
#[cfg(not(feature = "cli"))]
#[serde(skip_serializing_if = "Option::is_none")]
pub text: Option<Cow<'source, str>>,
/// (...docs)
#[cfg(feature = "cli")]
#[clap(short = 'd', long, conflicts_with = "text")]
#[serde(skip_serializing_if = "Option::is_none")]
pub data: Option<Data<'static>>,
/// (...docs)
#[cfg(not(feature = "cli"))]
#[serde(skip_serializing_if = "Option::is_none")]
pub data: Option<Data<'source>>,
As for the upstream issue - I had a look but I believe it's beyond my skills. Needs quite a bit of familiarity with how their derive macros work, and the author seems to agree having assigned it that E-hard
tag. If you want to hold this off until you have a look at making a PR yourself though, that would be fine by me.
e struct is better but to provide another option,
This is not a wrong idea, but the issue is that compiling with cli
will then enforce 'static
lifetime everywhere. Which is something we want to avoid even in the CLI, when we read text from files with ltrs check FILES...
This is an edge case, but we don't want to have to keep a static lifetime reference to the content of each file, where we don't need it. So copying the struct is the only we to both have Clap happy and still be able to use non-static lifetime when desired.
As for the upstream issue - I had a look but I believe it's beyond my skills. Needs quite a bit of familiarity with how their derive macros work, and the author seems to agree having assigned it that E-hard tag. If you want to hold this off until you have a look at making a PR yourself though, that would be fine by me.
I agree, this is not easy, especially as it involves proc macros!
So, it's not ideal, and I had to use the lifetime crate, but what do you think of the latest commit?
Perhaps it's better to leave those fields as String
until clap
allows us to avoid all the duplication?
Continuation of #117.
As discussed, some refactors. You will probably want some more done here and that's OK, just let me know what to do.
One thing for sure is the
String
values inapi/check.rs
- which ones exactly would you like me to make&str
orCow<'_, str>
?Also, for
cli/mod.rs
, I could potentially remove the need for the manual call ofcmd.execute
for each subcommand using the enum_dispatch crate if you're OK with adding a dependency? Fine as it is anyway though in my opinion, as there aren't many subcommands.