If a PDF document is deleted and in the recycle bin, it is correctly removed from the index but if the index is rebuilt it is included again. The index is indexing documents regardless of whether they are in the recycle bin or not. I have made some changes to ignore PDF documents in the recycle bin using similar code to the Umbraco built-in ContentValueSetValidator and the UmbracoContentIndex.
Steps to Reproduce
Add a PDF document to the media library
Use the Examine tab to search the contents of the PDF document - it should return a matching result
Delete the document so it goes into the recycle bin
Use the Examine tab to search the contents of the PDF document - it should not return a matching result
Rebuild the PDF index
Use the Examine tab to search the contents of the PDF document - it should not return a matching result but it does
I have made an assumption that the PDF index is likely to be be used externally so PDF documents in the recycle bin should not be included. If you want the reverse by default and think this should be actived by a config setting please let me know.
Thanks for the PR. From the title it sounds like a good addition. We will see if we can find time in one of the upcoming sprints to review and potentially release this
Problem
If a PDF document is deleted and in the recycle bin, it is correctly removed from the index but if the index is rebuilt it is included again. The index is indexing documents regardless of whether they are in the recycle bin or not. I have made some changes to ignore PDF documents in the recycle bin using similar code to the Umbraco built-in ContentValueSetValidator and the UmbracoContentIndex.
Steps to Reproduce
I have made an assumption that the PDF index is likely to be be used externally so PDF documents in the recycle bin should not be included. If you want the reverse by default and think this should be actived by a config setting please let me know.