Open samuelcolvin opened 1 year ago
okay, I have this working using lazy_static!
and Mutex
, not sure on the performance impact of the Mutex
, but otherwise that's working for me.
@justinwilaby I see you added some labels, does that mean you'd be happy to support this?
If so, what would your preferred approach be?
It would also be useful to extend Encode
to provide methods to get target
and content
without encoding and decoding them - would you be okay to accept this change? It shouldn't have any affect in the wasm case since the methods wouldn't be used and should be compiled out.
@samuelcolvin - Yes. I believe these changes would be beneficial for sure.
I like the idea of implementing the IntoIterator
trait to SAXParser
but this may become complicated when building an aggregator for the various nodes encountered when processing a call to write
. If we do implement IntoIterator
, what exactly is iterated? Tags, attributes, text nodes or all the above?
I have toyed with the idea of using IntoIterator
as super trait of Encode
with the XML root treated as an Encode
type. From there, it is possible to go further and provide implementors a way to walk the entire XML tree conditionally in a similar way that Acorn Walk does for a JavaScript AST.
The major problem to overcome is avoiding the memory hit of storing all entities in the aggregator which may be a side effect of iterating especially when a very large XML document is fully loaded before iteration begins.
With this said, in the short term, updating the signature of Encode
seems to be the most practical. For wasm targets, keep the event handler as-is. For C bindings or a crate, just return a pointer to Encode
.
Thanks for the response.
For me an iterator that yields Event
s would be fine - that's what I'm working with at the moment.
I'd also be happy with a next()
or get_event()
method on SAXParser
without a formal iterator trait.
What I currently have is
lazy_static! {
static ref TEMPLATE_BUILDER: Mutex<TemplateBuilder> = Mutex::new(TemplateBuilder::new());
}
fn event_handler(event: Event, data: &dyn Encode<Vec<u8>>) {
TEMPLATE_BUILDER.lock().unwrap().event_handler(event, data);
}
fn parse_sax(data: &PyBytes) -> PyResult<PyObject> {
let mut p = SAXParser::new(event_handler);
p.events =
Event::Text as u32 |
Event::Doctype as u32 |
Event::Comment as u32 |
Event::OpenTagStart as u32 |
Event::Attribute as u32 |
Event::CloseTag as u32 |
Event::Cdata as u32;
let b = data.as_bytes();
p.write(b);
p.identity();
let mut t = TEMPLATE_BUILDER.lock().unwrap();
// get the node
t.reset();
}
it would be nice to get rid of the lazy_static
if possible.
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Hi, I'm using sax-wasm from rust (it's great 🙏), I'm currently using git submodules, which is working fine, but clearly a proper rust crate would be ideal. (To be clear - a rust crate would be good long term, but I don't actually need it right now)
Secondly, I'm having some trouble setting up a "safe" way to collect nodes of an xml document in rust.
The problem is
The only way I can see to mutate an object in a function which matches this signature is to use a
static mut
and thenunsafe
, which isn't ideal.Maybe I'm missing something obvious, but it would be great to be able to process "event/data"s and mutate a struct without resorting to unsafe.
Describe the solution you'd like A clear and concise description of what you want to happen.
I guess either:
Iter
interface toSAXParser
EventListener
Let me know if the request is unclear and I can add specific examples.