superrcoop / weconnec

The project intends to resolve the need for a central platform that allows, primarily students to share resources among themselves
https://weconnec.xyz
GNU General Public License v3.0
0 stars 0 forks source link

Indexing PDFs with PyLucene #6

Open superrcoop opened 6 years ago

superrcoop commented 6 years ago

As a security measure , accepting only PDF documents and or images will restrict users from attaching malicious scripting(macros) to our database.

https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167 https://github.com/pdfminer/pdfminer.six

superrcoop commented 6 years ago

http://lucene.apache.org/pylucene/