Open mattnenterprise opened 1 year ago
Hi,
It seems like jaq
is using 0.9
. The 1.0
release is a from-scratch rewrite of chumsky. Do you experience the same issue with 1.0
?
jaq
generates lots of compile errors when I try to upgrade from 0.9
to 1.0
, and I'm not knowledgeable enough about the code base to upgrade them all.
I was able to reproduce the memory issue with the most current commit on main 6bb31a5f070e1289a11ea4a0193c7ccb236c81fa. I did so with the following code.
use chumsky::prelude::*;
#[derive(Debug, PartialEq)]
enum Chain {
End,
Link(char, Box<Chain>),
}
fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
let mut chain = Recursive::declare();
chain.define(just::<_, _, extra::Err<Simple<char>>>('+')
.then(chain.clone())
.map(|(c, chain)| Chain::Link(c, Box::new(chain)))
.or_not()
.map(|chain| chain.unwrap_or(Chain::End)));
chain
}
fn main() {
for _n in 1..100_000_000 {
parser();
}
}
I actually based this example on an example in the recursive documentation here
So I installed miri to get more info about the memory leak. Both of the following were ran on main 6bb31a5f070e1289a11ea4a0193c7ccb236c81fa
Here is the example code I ran with miri.
use chumsky::prelude::*;
#[derive(Debug, PartialEq)]
enum Chain {
End,
Link(char, Box<Chain>),
}
fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
let mut chain = Recursive::declare();
chain.define(just::<_, _, extra::Err<Simple<char>>>('+')
.then(chain.clone())
.map(|(c, chain)| Chain::Link(c, Box::new(chain)))
.or_not()
.map(|chain| chain.unwrap_or(Chain::End)));
chain
}
fn main() {
parser();
}
Here is the output from miri when running this example.
error: memory leaked: alloc819 (Rust heap, size: 32, align: 8), allocated here:
--> /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9
|
98 | __rust_alloc(layout.size(), layout.align())
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: inside `std::alloc::alloc` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9: 98:52
= note: inside `std::alloc::Global::alloc_impl` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:181:73: 181:86
= note: inside `<std::alloc::Global as std::alloc::Allocator>::allocate` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:241:9: 241:39
= note: inside `alloc::alloc::exchange_malloc` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:330:11: 330:34
= note: inside `std::boxed::Box::<std::rc::RcBox<chumsky::recursive::Indirect<'_, '_, &str, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>>>::new` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:217:9: 217:20
= note: inside `std::rc::Rc::<chumsky::recursive::Indirect<'_, '_, &str, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>>::new` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/rc.rs:397:27: 397:94
note: inside `chumsky::recursive::Recursive::<chumsky::recursive::Indirect<'_, '_, &str, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>>::declare`
--> /home/mmccoy/code/chumsky/src/recursive.rs:128:42
|
128 | inner: RecursiveInner::Owned(RefC::new(Indirect {
| __________________________________________^
129 | | inner: OnceCell::new(),
130 | | })),
| |______________^
note: inside `parser::<'_>`
--> examples/memory_leak.rs:10:21
|
10 | let mut chain = Recursive::declare();
| ^^^^^^^^^^^^^^^^^^^^
note: inside `main`
--> examples/memory_leak.rs:22:5
|
22 | parser();
| ^^^^^^^^
error: memory leaked: alloc881 (Rust heap, size: 24, align: 8), allocated here:
--> /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9
|
98 | __rust_alloc(layout.size(), layout.align())
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: inside `std::alloc::alloc` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9: 98:52
= note: inside `std::alloc::Global::alloc_impl` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:181:73: 181:86
= note: inside `<std::alloc::Global as std::alloc::Allocator>::allocate` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:241:9: 241:39
= note: inside `alloc::alloc::exchange_malloc` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:330:11: 330:34
= note: inside `std::boxed::Box::<chumsky::combinator::Map<chumsky::combinator::OrNot<chumsky::combinator::Map<chumsky::combinator::Then<chumsky::primitive::Just<char, &str, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, chumsky::recursive::Recursive<chumsky::recursive::Indirect<'_, '_, &str, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>>, char, Chain, chumsky::extra::Full<chumsky::error::Simple<'_, char>, (), ()>>, (char, Chain), [closure@examples/memory_leak.rs:14:14: 14:26]>>, std::option::Option<Chain>, [closure@examples/memory_leak.rs:16:14: 16:21]>>::new` at /home/mmccoy/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:217:9: 217:20
note: inside `parser::<'_>`
--> examples/memory_leak.rs:12:5
|
12 | / chain.define(just::<_, _, extra::Err<Simple<char>>>('+')
13 | | .then(chain.clone())
14 | | .map(|(c, chain)| Chain::Link(c, Box::new(chain)))
15 | | .or_not()
16 | | .map(|chain| chain.unwrap_or(Chain::End)));
| |__________________________________________________^
note: inside `main`
--> examples/memory_leak.rs:22:5
|
22 | parser();
| ^^^^^^^^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
note: the evaluated program leaked memory, pass `-Zmiri-ignore-leaks` to disable this check
error: aborting due to 2 previous errors
If I define the parser with the recursive
function like so there is no memory leak reported by miri.
use chumsky::prelude::*;
#[derive(Debug, PartialEq)]
enum Chain {
End,
Link(char, Box<Chain>),
}
fn parser<'a>() -> impl Parser<'a, &'a str, Chain, extra::Err<Simple<'a, char>>> {
recursive(|chain|
just::<_, _, extra::Err<Simple<char>>>('+')
.then(chain.clone())
.map(|(c, chain)| Chain::Link(c, Box::new(chain)))
.or_not()
.map(|chain| chain.unwrap_or(Chain::End))
)
}
fn main() {
parser();
}
I feel like I encountered this issue before and determined it would be pretty hard to fix, since clones of a Recursive::declare
, which is strong, are themselves strong, and changing this could make it easy to accidentally drop your recursive parser:
let expr = Recursive::declare();
// Current Behavior: Strong clone - and since you now have an `Rc` containing itself, it leaks
let a = foo(expr.clone());
expr.define(a);
// If the clone was weak instead: Haha, whoops, you drop the original `expr` here and since the clone is weak it panics when you use the parser
expr.clone()
This feels like it's effectively equivalent to full-on garbage collection in the general case, which is not really possible to solve in Rust while hiding it at an API level.
Honestly, the easiest answer may be a weak_clone
method that gets you a version of the parser to use inside itself.
Give me a couple hours (:sweat_smile:), but I think I have a solution for this!
Alright, so this is meant to be a rough-draft of a solution, but I feel like it helps to solve the original problem, and prevents the user from accidentally calling the wrong clone (weak_clone
), which I feel would make it too easy to accidentally (and silently) create a memory leak.
Hey! I've also encountered a memory leak with jaq because of this. I'm wondering if we can solve this use case by restricting and extending Chomsky's interface for recursive functions?
Within jaq there is some code which creates two mutually recursive parsers. The code is roughly
fn filter() -> impl Parser<Token, Spanned<Filter>, Error = Simple<Token>> + Clone {
let parser1 = Recursive::declare();
let parser2 = Recursive::declare();
parser1.define(transform1(parser1.clone(), parser2.clone()));
parser2.define(transform2(parser1.clone(), parser2.clone()));
// At this point parser1 has incremented the strong reference count of parser2, and parser1
// has incremented the strong reference count of parser2, so none of them will ever be dropped.
return parser1
}
The issue is that parser1.clone()
and parser2.clone()
increment the strong counts of the OnceCell
s of the parsers. So if we change the code to use weak references when cloning instead we should not have any trouble. But not so quick. Let's say parser2.clone()
in the above instead incremented the weak count of the OnceCell
of parser2
. Then when we get to the end of the filter
function we drop parser2
, since there is only one strong reference to it. When we then try to use parser1
we get a panic, as the weak reference to parser2
is no longer valid.
This suggests that parser1
and parser2
can only be dropped together. Maybe we can change the API to enforce this?
Let's add a guarding struct EitherParser
which ensures that the parsers are always dropped together, and let's add recursive_bind
of the following form:
#[allow(missing_docs)]
pub fn recursive_bind<
'a,
I: Clone,
O1,
P1: Parser<I, O1, Error=E> + 'a,
P2,
R: Into<EitherParser<P1, P2>>,
F: FnOnce(Recursive<'a, I, O1, E>) -> R,
E: Error<I>,
>(
f: F,
) -> EitherParser<Recursive<'a, I, O1, E>, P2> {
let mut left_parser = Recursive::declare();
let either = f(
Recursive(match &left_parser.0 {
RecursiveInner::Owned(x) => RecursiveInner::Unowned(Rc::downgrade(x)),
RecursiveInner::Unowned(_) => unreachable!(),
})
).into();
left_parser.define(either.0);
EitherParser(left_parser, either.1)
}
#[allow(missing_docs)]
#[derive(Clone)]
pub struct EitherParser<P1, P2>(P1, P2);
impl <P1, P2> From<(P1, P2)> for EitherParser<P1, P2> {
fn from(value: (P1, P2)) -> Self {
EitherParser(value.0, value.1)
}
}
impl <I, O, P1, P2> Parser<I, O> for EitherParser<P1, P2>
where
I: Clone,
P1: Parser<I, O>,
{
type Error = P1::Error;
fn parse_inner<D: Debugger>(&self, debugger: &mut D, stream: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error>
where
Self: Sized
{
self.0.parse_inner(debugger, stream)
}
fn parse_inner_verbose(&self, d: &mut Verbose, s: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error> {
self.0.parse_inner_verbose(d, s)
}
fn parse_inner_silent(&self, d: &mut Silent, s: &mut StreamOf<I, Self::Error>) -> PResult<I, O, Self::Error> {
self.0.parse_inner_silent(d, s)
}
}
impl <P1, P2> EitherParser<P1, P2> {
fn flip(self) -> EitherParser<P2, P1> {
EitherParser(self.1, self.0)
}
fn left(&self) -> &P1 {
&self.0
}
fn right(&self) -> &P2 {
&self.1
}
}
Then we can define recursive
and recursive_2
as
pub fn recursive<
'a,
I: Clone,
O,
P: Parser<I, O, Error = E> + 'a,
F: FnOnce(Recursive<'a, I, O, E>) -> P,
E: Error<I>,
>(
f: F,
) -> Recursive<'a, I, O, E> {
recursive_bind(|recursive| (f(recursive), ())).0
}
#[allow(missing_docs)]
pub fn recursive_2<
'a,
I: Clone,
O1,
O2,
P1: Parser<I, O1, Error=E> + 'a,
P2: Parser<I, O2, Error=E> + 'a,
R: Into<EitherParser<P1, P2>>,
F: FnOnce(Recursive<'a, I, O1, E>, Recursive<'a, I, O2, E>) -> R,
E: Error<I>,
>(
f: F,
) -> EitherParser<Recursive<'a, I, O1, E>, Recursive<'a, I, O2, E>> {
recursive_bind(|left| recursive_bind(|right| f(left, right).into().flip()).flip())
}
...
We should also make Recursive::declare
non-public, to prevent accidents with the API.
I've tested a modified version of jaq with chomsky 0.9.3 and this function, and the memory leak of jaq was gone (loop { jaq_parse::defs() }
was sufficient to reproduce).
The
recursive
implementation leaks memory if it references itself and the parser definition is defined using thedefine
function. See the following example that creates millions of parsers, but the memory is never released.I originally found this issue in the jaq filter parser here.