Closed Nationwidechildrens closed 5 years ago
I can see some data about PDF's in the elasticsearch gopa-index and gopa-snapshot.
GET /gopa-index/_search { "query": { "multi_match": { "query": ".pdf", "fields": ["snapshot.ext","snapshot.url"] } } }
this returns about 2391 hits. But if search for any of the metadata in the pdf. I dont get any PDF's in the results. I have even tried using words in the filename and the results still do not show any PDF's.
Hi, @Nationwidechildrens currently the PDF is not processed, but is doable for sure
@Nationwidechildrens the parse_pdf joint is pushed to master, fell free to try out.
parse_pdf
I can see some data about PDF's in the elasticsearch gopa-index and gopa-snapshot.
GET /gopa-index/_search { "query": { "multi_match": { "query": ".pdf", "fields": ["snapshot.ext","snapshot.url"] } } }
this returns about 2391 hits. But if search for any of the metadata in the pdf. I dont get any PDF's in the results. I have even tried using words in the filename and the results still do not show any PDF's.