kuchiki-rs / kuchiki

(朽木) HTML/XML tree manipulation library for Rust
MIT License
470 stars 54 forks source link

How to replace an element with new element defined as string? #60

Closed toinbis closed 1 year ago

toinbis commented 5 years ago

Hi,

I have a working code:

extern crate kuchiki;
use kuchiki::traits::*;

fn main() {        
    let html = "
    <html>
        <head></head>
        <body>
            <p class='foo'>Hello, world!</p>
            <p class='foo'>I love HTML</p>
        </body>
    </html>";

    let document = kuchiki::parse_html().one(html);
    let paragraph = document.select("p").unwrap().collect::<Vec<_>>();

    for element in paragraph {
        let new_p_element = "<p class='newp'>Hello, from loved HTML</p>";
        element.as_node().detach()
    }

    println!("{}", document.to_string())

Instead of detaching/remove'ing p element's i'd like to replace them with the element that is defined in new_p_element. How would I achieve something like element.as_node.replace(&new_p_element) just with a code which actually compiles?

Thanks!

andrewbanchich commented 4 years ago

I'm also struggling with this @SimonSapin. I've been trying to traverse and replace Text nodes but can't seem to overwrite NodeRefs with something like node = new_node.

toinbis commented 4 years ago

Hi @andrewbanchich, I have managed to craft a working sample. Give me some time, will post it here in the few following hours.

SimonSapin commented 4 years ago

This sounds similar to https://github.com/kuchiki-rs/kuchiki/issues/62. Except that new_p_element here is a string, so you’d need to parse it first.

Code like node = new_node only assigns to a local variable and does not mutate the tree.

andrewbanchich commented 4 years ago

Code like node = new_node only assigns to a local variable and does not mutate the tree.

That's the issue I'm having. There's no way to swap one node with another? #62 describes wrapping one element in another but what if we want to just mutate or replace one element without wrapping it in anything?

SimonSapin commented 4 years ago

I wrote similar, not identical. Please read my comment there and adjust the steps for what you’re trying to do.

More generally, please have a look at the methods on https://docs.rs/kuchiki/0.7.3/kuchiki/struct.NodeRef.html and other parts of the API and consider how you can combine them.

toinbis commented 4 years ago

Hi Andrew,

this is an example how to swap one element with another. Source code for main.rs:

use html5ever::{interface::QualName, local_name, namespace_url, ns};
use kuchiki::{traits::*, Attribute, ExpandedName, NodeRef};

pub fn make() -> String {    
    let text = "
    <html>
        <head></head>
        <body>
            <p class='foo'>Hello, world!</p>
            <p class='foo'>I love HTML</p>
        </body>
    </html>";

    let document = kuchiki::parse_html().one(text);
    let paragraph = document.select("p").unwrap().collect::<Vec<_>>();

    for element in paragraph {
        let par = NodeRef::new_element(
        QualName::new(None, ns!(html), local_name!("p")),
        Some((
            ExpandedName::new("", "class"),
            Attribute {
                    prefix: None,
                    value: "newp".to_owned(),
            },
        )),
        );

        par.append(NodeRef::new_text("My new text"));

        element.as_node().insert_after(par);
        element.as_node().detach();
    };

    document.to_string()
}

pub fn main() {
     println!("{}", make())
}

My cargo.toml is as follows:

[package]
name = "kuchikidemo4"
version = "0.1.0"
authors = [""]
edition = "2018"

[features]
stdweb = [ "instant/stdweb" ]

[dependencies]
html5ever = "0.23.0"
kuchiki = "0.7.3"
markup5ever="0.8.1"

The output of cargo run is:

$cargo run
   Compiling kuchikidemo4 v0.1.0 (<..>/rust_projects/kuchikidemo4)
    Finished dev [unoptimized + debuginfo] target(s) in 2.71s
     Running `target/debug/kuchikidemo4`
<html><head></head>
        <body>
            <p class="newp">My new text</p>
            <p class="newp">My new text</p>

    </body></html>

Kindly please let me know if you manage to compile the above code successfully or if you have any questions.

andrewbanchich commented 4 years ago

Thanks @toinbis! Here is an example of what I'm trying to get working:

use html5ever::{interface::QualName, namespace_url, ns, LocalName};
use kuchiki::{traits::*, NodeRef, iter::NodeEdge, NodeData};

pub fn main() {
    let html = "
    <html>
        <head></head>
        <body>
            <p class='foo'>Hello, world!</p>
            <p class='foo'>I love HTML.</p>
        </body>
    </html>";

    let doc = kuchiki::parse_html().one(html);

    doc.traverse().for_each(|node| {
        if let NodeEdge::Start(node) = node {
        // if it's text, look for some content and wrap any matches with an element
        if let NodeData::Text(text) = node.data() {

        let mut new_nodes = Vec::new();

        new_nodes.push(NodeRef::new_text("I "));

        // add match
        let wrapper = NodeRef::new_element(
                    QualName::new(None, ns!(html), LocalName::from("data-contains-love")),
                    None,
        );

        wrapper.append(NodeRef::new_text("love"));

        new_nodes.push(wrapper);
        new_nodes.push(NodeRef::new_text(" HTML."));

        match node.next_sibling() {
            Some(sibling) => {
            new_nodes.into_iter().for_each(|n| {
                sibling.insert_before(n);
            })
            },
            None => {
            let parent = node.parent().unwrap();
            new_nodes.into_iter().for_each(|n| {
                parent.append(n);
            })
            }
        }

        node.detach();

            }
        }
    });

    dbg!(doc.to_string());
}

I can't create a new parent element because I am not previously aware of what the HTML will look like ahead of time. The code works, but doesn't detach the current node.

This is what I get as a result:

<html><head></head>I <data-contains-love>love</data-contains-love> HTML.<body>\n            <p class=\"foo\">Hello, world!</p>\n
<p class=\"foo\">I love HTML.</p>\n\n
</body></html>

Any thoughts on what the issue is?

Thanks!

toinbis commented 4 years ago

Hi, @andrewbanchich - I guess you might be interested in checking out https://github.com/cloudflare/lol-html which was released today (more info https://blog.cloudflare.com/html-parsing-1/).

andrewbanchich commented 4 years ago

Thanks! I ended up rewriting my code to be a recursive function that just reconstructs the entire tree from scratch and it's working now.

This looks excellent though!

Ygg01 commented 4 years ago

@andrewbanchich @toinbis is this related to #64 ? Would closing that PR solve this issue?

andrewbanchich commented 4 years ago

@Ygg01 Yep! If you think my PR is a good solution for this then definitely.

SimonSapin commented 1 year ago

I will soon archive this repository and make it read-only, so this issue will not be addressed: https://github.com/kuchiki-rs/kuchiki#archived