scrapinghub / number-parser

Parse numbers written in natural language
BSD 3-Clause "New" or "Revised" License
108 stars 23 forks source link

Support for simple fractions #59

Closed Manish-210 closed 3 years ago

Manish-210 commented 3 years ago

We can add support for the simple fractions as well like three-fourth, half, etc. Example:

parse("I have a three-fourth cup of glass.") I have a 3/4 cup of glass.

parse_number("half") 1/2

Gallaecio commented 3 years ago

I would actually expect parse_number("half") to return Decimal('0.5'). Maybe we could have a separate parse_fraction for 1/2?

Manish-210 commented 3 years ago

parse_fraction seems good, I'll give a try to implement it.

Manish-210 commented 3 years ago

Some points to notice for the parse_fraction function :

  1. fractions have several ways to spell. For example: 3/4 as three fourths, three over fourth, three divided by fourth, or simply three by fourth. In which the most commonly used is three fourths and I guess it is sufficient to implement a function for these types now.

  2. after the number 3 the pattern is denominator with s, like three thirds ( 3/3), five fourths (5/4) but for denominator with 1 spells wholes and with 2 spells halves like four wholes ( 4/1), two halves (2/2).

  3. A small detail: if the numerator is greater than one then the denominator is plural otherwise it is singular like one whole (1/1), one half (1/2) which is obvious.

Manish-210 commented 3 years ago

I have written the parse_fraction function given below

` def parse_fraction(input_string, language=None):

"""Converts a single number written in fraction to a numeric type"""
if not input_string.strip():
    return None

if language is None:
    language = _valid_tokens_by_language(input_string)

if language != 'en':
    raise ValueError(f'"{language}" is not a supported language for parsing fraction')

fraction_separators = ["divided by", "over", "by", "/"]

for separator in fraction_separators:
    position_of_separator = input_string.find(separator)

    if position_of_separator == -1:
        continue

    string_before_separator = input_string[:position_of_separator]
    string_after_separator = input_string[position_of_separator + len(separator):]

    number_before_separator = parse_number(string_before_separator, language)
    number_after_separator = parse_number(string_after_separator, language)

    if number_after_separator == None or number_after_separator == None:
        return None

    return str(number_before_separator) + '/' + str(number_after_separator)

return None

` Few things to mention:

  1. I have added the fraction_separator in this function which should be at some other place and then we have to import it so that it can work for all languages, It is only working for the English language currently.
  2. Fractions of other types like three fourths 3/4 ( fractions that end with 's' are not converted )
  3. Should I add test data also? Of course, there can be many improvements possible, let me know any details before I send a Pull Request.
Gallaecio commented 3 years ago

I think it’s OK to provide an imperfect initial implementation, it can be improved over time based on feedback.

And yes, I think it would be best to include tests.

Manish-210 commented 3 years ago

Yeah, Added the test data and sent a Pull request for this. Should I mark this as an improvement or close this issue?

Gallaecio commented 3 years ago

I’ve updated the description of https://github.com/scrapinghub/number-parser/pull/60 so that it closes this issue automatically once it is merged.