schierlm / BibleMultiConverter

Converter written in Java to convert between different Bible program formats
Other
124 stars 33 forks source link

Improved USFM validator/parser #26

Open schierlm opened 5 years ago

schierlm commented 5 years ago

As noticed in #22, the current USFM importer can give unclear error messages when parsing malformed USFM files.

It would be great to have a validator module (similar to XML validators) that can parse the USFM file and output detailed information where (file name, line number) validation errors occur. It would probably also need some kind of electronic description of available tags and their parameter types.

alerque commented 5 years ago

In my CI jobs that parse Bibles in active development, I use u2o to convert the USFM to OSIS, which is an XML dialect that can be validated against it's spec. This doesn't give very detailed information about the source line numbers etc., but it does give me a heads up when people start messing around with invalid syntax.