This PR aims to remove the redundancy in the returned result of calling a Delemmatizer object. For example, the test test_delemmatizes_lemmas and test_delemmatizes_non_lemmas can fail on the second run if running twice due to the redundant elements in the returned results.
def test_delemmatizes_lemmas():
> assert dl("look") == [
"looked",
"looking",
"looks",
"look",
], "should delemmatize lemmas"
E AssertionError: should delemmatize lemmas
E assert ['looked', 'l...look', 'look'] == ['looked', 'l...ooks', 'look']
E Left contains one more item: 'look'
E Use -v to get the full diff
In the above error message, we see that look appears twice in the returned result.
This PR can fix this kind of issue: instead of directly appending the word into delems, my fix is to first check whether word already exists in delems before adding it so that there are no redundant elements.
This PR aims to remove the redundancy in the returned result of calling a
Delemmatizer
object. For example, the testtest_delemmatizes_lemmas
andtest_delemmatizes_non_lemmas
can fail on the second run if running twice due to the redundant elements in the returned results.In the above error message, we see that
look
appears twice in the returned result.This PR can fix this kind of issue: instead of directly appending the
word
intodelems
, my fix is to first check whetherword
already exists indelems
before adding it so that there are no redundant elements.