usnistgov / F4DE

Framework for Detection Evaluation (F4DE) : set of evaluation tools for detection evaluations and for specific NIST-coordinated evaluations
Other
24 stars 11 forks source link

handling XML entities #3

Open jtrmal opened 6 years ago

jtrmal commented 6 years ago

Hi, the content of the kwlist.xml is

  <kw kwid="KW">
    <kwtext>&lt;WORD&gt; word second_word</kwtext>
  </kw>

and rttm file is

 LEXEME utt 1 0 0.39 <WORD> <NA> <NA> <NA>
 LEXEME utt 1 0.39 0.15 word <NA> <NA> <NA>
 NON-LEX utt 1 0.54 0.05 <eps> <NA> <NA> <NA>
 LEXEME utt 1 0.59 0.17 second_word <NA> <NA> <NA>

then the alignment procedure will not map these two things together (no entry in alignment.csv). However, when I manually edit the rttm to contain this

 LEXEME utt 1 0 0.39 &lt;WORD&gt; <NA> <NA> <NA>
 LEXEME utt 1 0.39 0.15 word <NA> <NA> <NA>
 NON-LEX utt 1 0.54 0.05 <eps> <NA> <NA> <NA>
 LEXEME utt 1 0.59 0.17 second_word <NA> <NA> <NA>

the mapping will be created as expected.

I would assume the xml entities (apos, lt, gt, quot and amp) will be decoded/normalized, because they are enforced by the xml specification to be in the "encoded" form, i.e. it's not at the whim of the user how to put these strings there.