umbraco / UmbracoExamine.PDF

PDF indexing support in UmbracoExamine
Other
24 stars 19 forks source link

Only index published PDF documents (i.e. documents not in the recycle bin). #40

Open justin-nevitech opened 1 year ago

justin-nevitech commented 1 year ago

Problem

If a PDF document is deleted and in the recycle bin, it is correctly removed from the index but if the index is rebuilt it is included again. The index is indexing documents regardless of whether they are in the recycle bin or not. I have made some changes to ignore PDF documents in the recycle bin using similar code to the Umbraco built-in ContentValueSetValidator and the UmbracoContentIndex.

Steps to Reproduce

  1. Add a PDF document to the media library
  2. Use the Examine tab to search the contents of the PDF document - it should return a matching result
  3. Delete the document so it goes into the recycle bin
  4. Use the Examine tab to search the contents of the PDF document - it should not return a matching result
  5. Rebuild the PDF index
  6. Use the Examine tab to search the contents of the PDF document - it should not return a matching result but it does

I have made an assumption that the PDF index is likely to be be used externally so PDF documents in the recycle bin should not be included. If you want the reverse by default and think this should be actived by a config setting please let me know.

bergmania commented 1 year ago

Hi @justin-nevitech

Thanks for the PR. From the title it sounds like a good addition. We will see if we can find time in one of the upcoming sprints to review and potentially release this