usnationalarchives / OPAProd

Tracking enhancements to OPAProd
1 stars 0 forks source link

Extract ASCII text (plus other formats) #58

Open DominicBM opened 9 years ago

DominicBM commented 9 years ago

We need to investigate whether the actual data contained within certain electronic record formats can be extracted like file metadata. This could be used to enhance search and API outputs. In particular, can the plain text of an ASCII file be extracted? What about the text in other file types, like Word and Excel files?