thomjur / PyCollocation

Python module to do simple collocation analysis of a corpus.
GNU General Public License v3.0
0 stars 1 forks source link

Searching with wildcards? #5

Closed trutzig89182 closed 2 years ago

trutzig89182 commented 2 years ago

Should we include the possiblity to search with wildcards? This should be possible by defining the search_word as a regex object and checking if it matches the word x in the current sentence.

I had the impression, that you don’t se regex as a core functionality. Perhaps it is problematic for the search of collocations?

In my case I look for words associated with „Datenschützer“ and „Datenschützer\in(en)“, and so searching for „Datenschützer*“ could make things easier.

thomjur commented 2 years ago

No, you are right, I think it is an important feature! It's just that I didn't think about it immediately since I am oftentimes working with lemmatized texts (where regex might still make sense). Do you want to implement regex search terms? Maybe we can state in the issues who is currently working on a feature implementation. Or better: who would like to start.

trutzig89182 commented 2 years ago

I have been working on something, as I tried to advance my little project a bit. Will have as second look at it tomorrow and make a pull request then. But the tests passed, so it should be ok.

I used the re package for now and will add it to the readme file. Perhaps it could make sense to create a requirement.txt file that allows to load dependencies more swiftly via pip.

trutzig89182 commented 2 years ago

No need to add re to requirements as it is part of the standard python package.