lukehollis / iip-word-lists

Python utility for creating word lists from epidoc files
1 stars 1 forks source link

new approach making everything a word #14

Open emylonas opened 3 years ago

emylonas commented 3 years ago
  1. follow existing code down to l. 148 (deletes <space>)
  2. kludge the xml by replacing all spaces inside start tags with a character not used elsewhere •?, perhaps using a substitute function?
  3. change all spaces to </w> <w>
  4. Add a <w> to the beginning of the string
  5. Add a </w> to the end of the string.
  6. change all bullet to space so that element tags are ok again.

2nd task: change <name att="something" att2="#something #else"> to <nam•att="something"•att2="#something•#else">