It would be nice to have access to the /rmeta/text call.
This retrieves the meta data as well as the text content.
Why this is important is when you have tesseract installed, tika will use it. So retrieving the metadata of one file I have here (an image) takes 76s. Retrieving the text ALSO takes 76 seconds.
[ languages= nld+eng+fra+deu+chi_sim+chi_tra]
A curl test confirms this.
Now however if I use /rmeta/text, I get all the same informations, but within this same 76s.
So it'd be very nice to have some way to get rmeta
Tests:
time curl -T /root/gmail_backup/mails/938/368/133/922/520/1598520922133368938.03 http://localhost:9998/tika --header "X-Tika-OCRLanguage: nld+eng+fra+deu+chi_sim+chi_tra"
real 1m16.375s
user 0m0.008s
sys 0m0.008s
time curl -T /root/gmail_backup/mails/938/368/133/922/520/1598520922133368938.03 http://localhost:9998/meta --header "X-Tika-OCRLanguage: nld+eng+fra+deu+chi_sim+chi_tra"
real 1m16.356s
user 0m0.008s
sys 0m0.004s
time curl -T /root/gmail_backup/mails/938/368/133/922/520/1598520922133368938.03 http://localhost:9998/rmeta/text --header "X-Tika-OCRLanguage: nld+eng+fra+deu+chi_sim+chi_tra"
real 1m16.247s
user 0m0.012s
sys 0m0.004s
Related to this I'll open an improvement request ;-)
It would be nice to have access to the /rmeta/text call. This retrieves the meta data as well as the text content.
Why this is important is when you have tesseract installed, tika will use it. So retrieving the metadata of one file I have here (an image) takes 76s. Retrieving the text ALSO takes 76 seconds. [ languages= nld+eng+fra+deu+chi_sim+chi_tra]
A curl test confirms this.
Now however if I use /rmeta/text, I get all the same informations, but within this same 76s.
So it'd be very nice to have some way to get rmeta
Tests:
Related to this I'll open an improvement request ;-)