microsoft / AzureSearch_JFK_Files

This repo contains the sample code of the Azure Search and Cognitive Services used to provide insights and analysis around the JFK Files.
MIT License
389 stars 225 forks source link

search is not working with arabic pdf in azure #101

Closed ReshmiGitHub closed 3 years ago

ReshmiGitHub commented 4 years ago

Please help to create a site like https://jfkfiles2.azurewebsites.net/ for arabic pdf

Careyjmac commented 3 years ago

Shared link is not the one associated with this repository, so closing this issue. For what its worth, this demo should in theory work with Arabic PDFs as well if you make a few small changes (not tested so not for sure this will work, might need some other changes as well):

  1. Change this line to use "generateNormalizedImagePerPage" to ensure that your PDFs will be converted to images and thus can be OCR'd
  2. Change this line to use "OcrSkillLanguage.Ar" so that the OCR will be Arabic.