suyashb95 / WiktionaryParser

A Python Wiktionary Parser
MIT License
359 stars 92 forks source link

still get some empty elements #74

Closed tbm closed 3 years ago

tbm commented 3 years ago

Even after commit af23b50e41dae I get some empty fields.

Example:

[{'definitions': [{'examples': [],
                   'partOfSpeech': 'noun',
                   'relatedWords': [],
                   'text': ['Krismasi\xa0(n class, no plural)', 'Christmas']}],
  'etymology': 'From English Christmas.\n',
  'pronunciations': {'audio': [], 'text': []}}]

Test case:

#!/usr/bin/env python3

from pprint import pprint

from wiktionaryparser import WiktionaryParser

parser = WiktionaryParser()
parser.set_default_language('swahili')
word = parser.fetch("Krismasi")

pprint(word)
suyashb95 commented 3 years ago

@tbm Thank you for reporting this! I haven't released a new version of the parser with recent commits so just wanted to make sure that you're running from source and not from pip because that would be outdated. Could you clone the source and run python setup.py install from there?

tbm commented 3 years ago

Yes, this is with GitHub master.

suyashb95 commented 3 years ago

Interesting, I'll have a look

suyashb95 commented 3 years ago

@tbm #72 was supposed to fix cases where the output contains data for the wrong language. In your case, the output seems correct since you're setting the language to Swahili. I don't think this is an issue with the parser but, let me know if you're facing trouble with other words/languages

tbm commented 3 years ago

Right, but my question/bug is that the value I get contains empty elements. e.g.:

  'pronunciations': {'audio': [], 'text': []}}]

Why is pronunciations there when it wasn't set in the entry on Wiktionary?

suyashb95 commented 3 years ago

@tbm the output schema is standard across all words/languages. A lot of words don't have pronunciations or related words so those values would turn up as empty objects/lists even though the keys are included in the JSON. Maybe I should add a note to the readme about this

tbm commented 3 years ago

Oh, ok, thanks for the clarification. I was just surprised to see empty values. I was expecting pronunciations to simply not exist when there's no pronunciation on Wiktionary.

Documenting this might be a good idea, thanks.