Closed kanerv closed 3 years ago
Since I cannot get the module downloaded to run our stemmer, I decided to attempt the wildcard searches.
Our program now understands the easiest wildcard type (hous*). It generates a list of matching queries from the documents and then does a search with each. This is maybe not the ideal way to do it, since I think it would be nicer to include the set of wildcard queries in the same search, but I'm hopelessly late with this weeks assignment. Soooo, I'm running out of time due to the technical difficulties I've had this week and I'm sorry for that. I'm honestly dying to see the stemmer! 😢
Note that our program does not understand multi-word searches and thus if you search with 'anarch*' for example, the program won't find headers or snippets to print for 'anarcho-syndicalists' or even ''anarchism's'.
These are details that can be improved, but for now, the very very basic form of wildcard search should work.
No need to fret Kanerva, this looks really cool! Is the fact that we don't yet have the multi-word search the reason that if you search anarch*, you get "Search term not found"? For example, on anarchist feminism, it finds the term but it has no index: Query: 'anarcha-femin_s' Search term not found. No Matching doc.
I realised that our program won't find any matches even if you search for an exact match with hyphenated words. I think the issue must be inside the test_query(query) function and not because of the lack of a multi-word search function. I wonder if other groups have similar issues...
I added the regex we created for the week 4 program to fix the issue of hyphenated words also in the older project. I think I'll leave the wildcards as they are here. I know the solution in the week 4 program is smarter, but as I said, we have that code already up to date in a more recent program.
d. Wildcard searches: Let the users search on incomplete terms, such as hous (easiest) or ing (similar to previous case) or h*ing (hardest). Read Chapter 3 of the book to learn more about this topic.