hltdi / guampa

the collaborative translation website
GNU General Public License v3.0
3 stars 7 forks source link

script to take a document, preprocess it, and put it in the db #9

Open alexrudnick opened 11 years ago

alexrudnick commented 11 years ago

We have all the components. Probably use Tika to process PDFs and Word docs?

Figure out how to find paragraph breaks here...

That last part depends on #8.