zexa / de_hypertext

Ergonomics of serde_json for parsing html.
Apache License 2.0
1 stars 0 forks source link

Vec<T> types should support meta attribute filter #3

Open zexa opened 1 month ago

zexa commented 1 month ago
#![derive(de_hypertext::Deserialize)
struct Target {
    #[de_hypertext(selector = "div")]
    items: Vec<Listing>
}

#![derive(de_hypertext::Deserialize)
struct Listing {
    #[de_hypertext(selector = "li")]
    name: String,
}

This code will crash if at least one listing does not contain the element li when trying to parse Target.

The current solution is to make these Option and then filter out the unnecessary listings

#![derive(de_hypertext::Deserialize)
struct Target {
    #[de_hypertext(selector = "div")]
    items: Vec<Listing>
}

#![derive(de_hypertext::Deserialize)
struct Listing {
    #[de_hypertext(selector = "li")]
    name: Option<String>,
}

fn post_process(target: Target) -> Target {
    Target {
        items: target
            .items
            .into_iter()
            .filter_map(|listing| {
                if listing.name.is_some() {
                    Some(listing)
                } else {
                    None
                }
            })
            .collect(),
    }
}

However, this is tedious. Instead, we could edit the generated code to ignore the ones that failed and return everything else if we added an attribute such as

#![derive(de_hypertext::Deserialize)
struct Target {
    #[de_hypertext(selector = "div", filter)]
    items: Vec<Listing>
}

#![derive(de_hypertext::Deserialize)
struct Listing {
    #[de_hypertext(selector = "li")]
    name: String,
}
zexa commented 3 weeks ago

The post_process function mentioned above might not be the only way to handle these scenarios. Some users might want to log the error, some might want to repair the broken result. We need some API that would allow us to cover these use cases as well.