jmsv / ety-python

A Python module to discover the etymology of words
http://ety-python.rtfd.io
MIT License
144 stars 18 forks source link

Circular origin reference #20

Closed alxwrd closed 6 years ago

alxwrd commented 6 years ago

There's currently at least one circular reference in etymwn-relety.json.

ety.origins("software", recursive=True)

will eventually fail with a recursion error because software -> soft, -ware and -ware -> software.

alxwrd commented 6 years ago

I'm going to run a job to try and discover all these problems with the data.

Hidden as no longer relevant

```python import sys import ety sys.setrecursionlimit(20) total = len(ety.data.etyms) def find(): results = [] errors = [] for count, word in enumerate(ety.data.etyms): try: _ = ety.Word(word["a_word"], word["a_lang"]).origins() print("{}/{}".format(count, total), end="\r") except RecursionError: results.append(_) except Exception as e: errors.append({ "error": e, "word": _ }) return results, errors ``` ```python >>> import find_circulars >>> results, errors = find_circulars.find() 4094/473433 ```


I'll update here once it's done.
alxwrd commented 6 years ago

I've just realised this isn't actually recursion because it's the child's .origins() that's being called.

The issue is because the results are appended to the result list, and the chain "software" -> "-ware" -> "software" will just keep growing the result list.

I have a fix in mind, I'll submit a PR later.

jmsv commented 6 years ago

Hmm, could the solution be as simple as only adding a Word to a branch if it hasn't already appeared in that branch?