Remove 'source lifetime

mitsuhiko commented 2 years ago

Today the engine holds &'source str all over the place in the instructions and by extension in the templates. This is a design pillar that I originally inherited from the initial API that was inspired by tinytemplate. It's a pretty clever setup but it has the disadvantage that the API cannot deal with dynamic templates well.

The solution to this problem so far has been the source feature which introduces a Source type which lets the engine lie about lifetimes through the use of self-cell and memo-map.

There is a alternative that could be evaluated where these borrows to the source string are not taking place through the references but string handles. That would likely require significant changes in the lexer and potentially the parser as well, but the changes to the instructions are probably quite limited.

The way this could work is that the instructions that currently hold an &'source str would instead hold something like a StrHandle which looks something like this:

struct StrHandle {
    offset: u32,
    len: u32,
    #[cfg(debug_assertions)]
    instructions_id: u64,
}

impl StrHandle {
    pub fn as_str(&self, instr: &Instructions) -> &str {
        debug_assert_eq!(instr.id, self.instructions_id);
        &instr.source[self.offset as usize..self.offset as usize + self.len as usize]
    }
}

Obviously one complexity with this approach is that at no point the handles must be separated from the instructions they are contained in. It also comes at least through the safe interface with additional cost as Rust will require a lot more bounds checks. unsafe would work here, but actually making a safe interface where someone does not change the source string underneath the instructions accidentally could be quite tricky.

The benefit of doing this would be that the engine ends up with less dependencies, overall less unsafe code and a nicer API.

mitsuhiko commented 2 years ago

One other thing is that the engine has a few places where that basic model does not work well:

Token::Str holds a Cow so that it can store unescaped strings
for internal loop usage Instruction::Lookup and Instruction::GetAttr also get static strings at times
Internal maps often hold &'source str and with that change they would need to hold a String instead since we could not adjacently borrow

I think a model that could work is to have a StrCache<'source> which is used everywhere where currently &'source str is used to hold the source and then a few StrHandle given out by the StrCache<'source>. For as long as also for release builds an internal ID is given out that could be made safe without incurring bounds checks. For the few places where a &'static str is placed there, I think the handle could be allowed to store references to static strings or specific instructions are used instead. For the the Cow case that only exists in the Token but never in Instruction so that problem is much smaller in comparison. In the instructions it's already converted into a Value which clones.

mitsuhiko commented 1 year ago

I made some small changes internally that address some of these issues. In d53160d3ff7f54e11696d804708a7e3a7e4b5c2b I removed static strings added to instructions for loop internals and in 6e60023395ad3b6fe20aec9b843a65dda8978c54 I removed the Cow from the string token.

mitsuhiko commented 1 year ago

I had back out one of these due to performance regressions.

mitsuhiko commented 1 year ago

I have a new way now in which I think I want to go about this. The 'source lifetime is quite useful for multiple scenarios so it does not need outright removing. What actually however is annoying is that you don't have a convenient way to add owned strings into the environment. This runs against a bunch of lifetime issues, but that is already true for the Source feature too.

I think what I might be doing for MiniJinja 1.0 and to close off this issue is the following:

When the source feature is enabled, the Engine always holds a Source object internally to replace the template map
Environment::add_template stays and borrows
Environment::add_owned_template (name TDB) is does what today is env.source_mut().add_template(...)
Hide the Source object internally.

Since after you can add owned templates there is not a lot of reason to actually expose the Source object. The main of the Source object which at that point is internal is to set a loader:

Before:

let mut env = Environment::new();
env.set_source(Source::with_loader(|name| {
    if name == "layout.html" {
        Ok(Some("...".into()))
    } else {
        Ok(None)
    }
}));

After:

let mut env = Environment::new();
env.set_loader(|name| {
    if name == "layout.html" {
        Ok(Some("...".into()))
    } else {
        Ok(None)
    }
});

And for owned templates:

Before:

let mut env = Environment::new();
let mut source = Source::new();
source.add_template("index.html", "...").unwrap();
env.set_source(source);

After:

let mut env = Environment::new();
env.add_owned_template("index.html", "...").unwrap();

mitsuhiko / minijinja

Remove 'source lifetime #135