Open liarig opened 5 years ago
Thanks for your issue. The beviour you describe is expected. The tool does interpret everything as a unit that is not a common English word. Do you have a proposal to improve this behavior? Maybe one could disregard all units where two times the same unit appears. But sometimes this is wanted as in i.e. km² which could be written as km*km
Thank you for your response. I think that the case when the same unit appears more than one time should be considered only if this unit may be multidimensional (like in your example: length - square). Otherwise it may be disregarded.
Interpreting different abbreviations written together as a compound measure may leads to the mistake.
>>> parser.parse('a gin')
[Quantity(1, "Unit(name="gram inch", entity=Entity("unknown"), uri=None)")]
only if this unit may be multidimensional
On what basis would this than be decided. I can only imagine storing for every value whether there are multidimensional cases or not, which sounds to me like huge overhead, prone for errors.
Interpreting different abbreviations written together as a compound measure may leads to the mistake.
Currently, the most common 10.000 words of the English language are disregarded as "could be a unit". If you find additional words that are common (in the best case a whole list of them) or have a better idea for filtering, I'd be glad to integrate them.
Actually this in in some form a duplicate of #35
Describe the bug connects abbreviations together, what doesn't make sense
Expected behavior