tim-harding / kanjidic_utilities

Utilities for the EDRDG KANJIDIC2 dictionary files
Other
2 stars 1 forks source link

`Kanjidic::try_from(&str)` fails to parse kanjidic2.xml downloaded from edrdg.org #2

Open crumblingstatue opened 6 months ago

crumblingstatue commented 6 months ago
use kanjidic_parser::kanjidic::Kanjidic;

fn main() {
    let string = std::fs::read_to_string("/tmp/kanjidic2.xml").unwrap();
    dbg!(Kanjidic::try_from(&string[..]));
}

Output:

[src/main.rs:5:5] Kanjidic::try_from(&string[..]) = Err(
    Xml(
        DtdDetected,
    ),
)
tim-harding commented 6 months ago

It's been a while since I worked on this project, but the problems I had at the time were

  1. The XML format they use has a nonstandard schema that didn't work with the parsing library I used
  2. Some of the data are invalid according to their schema These shouldn't be too difficult to address, but you're right, their XML files require a few modifications to work with this library, which are manually applied to the version bundled with this repo. I'll look into fixing this when I get a chance.