Add currency tests - Githubissues

ivsanro1 commented 2 years ago

Hello,

Since we might add some features for fuzzy search, especially when the currency is not found (see https://github.com/scrapinghub/price-parser/issues/28#issuecomment-1274568097), we think it'd be nice to have some evaluation script and dataset that we can use to compare the quality of the extractions (thanks for the idea @lopuhin !).

The purpose of the evaluation is to compare the currency extraction quality of Price.fromstring. If the functionality of this function is extended via PR (e.g. we perform a fuzzy search if the currency is not found), we should be able to see an improvement in the metrics.

I decided not to add it as a test because the evaluation metrics can be interpreted in several ways.

For now, the metrics are simply global accuracy and accuracy per symbol (ground truth) for quality evaluation, and total extraction time and extraction time per sample for performance evaluation. These metrics can be easily extended, as the script is quite simple (evaluation/evaluate_currency_extraction.py).

If in the future we add other evaluation metrics or datasets, this script can be improved and generalized (e.g. adding argparse). For now I decided to keep it simple.

The dataset (dataset_eval.json) can also be easily extended to include more cases.

Here's a sample of the dataset_eval.json:

{
    "string": "14.00 SGD / Each",
    "currency": "SGD"
}

The evaluation script will extract the currency from string using Price.fromstring and compare it with currency. A correct extraction happens when both are equal.

With the current dataset and price_parser versions, this is the output of the evaluation:

./evaluation/evaluate_currency_extraction.py 
-----------------------------------
symbol (target)      acc      support   
-----------------------------------
           None      ‎1.0     13        
              $      ‎1.0     12        
             $U      ‎1.0     1         
             $b      ‎0.0     1         
           .د.ب      ‎0.0     1         
            AED      ‎1.0     1         
           AU $      ‎0.0     1         
            AU$      ‎1.0     1         
            AUD      ‎1.0     1         
             Ar      ‎0.0     1         
            B/.      ‎1.0     1         
             BD      ‎1.0     1         
            BZ$      ‎1.0     1         
             Br      ‎1.0     2         
             Bs      ‎1.0     1         
             C$      ‎1.0     2         
            CA$      ‎1.0     1         
           CAD$      ‎0.0     1         
           CAN$      ‎0.0     1         
            CFA      ‎1.0     1         
           CUC$      ‎1.0     1         
              D      ‎0.0     1         
            DKK      ‎1.0     1         
             DT      ‎1.0     1         
             Db      ‎1.0     1         
              E      ‎0.0     1         
            FBu      ‎1.0     1         
           FCFA      ‎1.0     1         
             FG      ‎1.0     1         
            FRw      ‎0.0     1         
            Fdj      ‎1.0     1         
             Ft      ‎1.0     1         
              G      ‎0.0     1         
            GEL      ‎1.0     1         
             Gs      ‎0.0     1         
             J$      ‎1.0     1         
              K      ‎0.0     1         
             KD      ‎1.0     1         
             KM      ‎1.0     1         
            KMF      ‎1.0     1         
            KSh      ‎0.0     1         
             Kr      ‎0.0     1         
             Kč      ‎1.0     1         
              L      ‎0.0     2         
             LD      ‎1.0     1         
            LEI      ‎1.0     1         
            Lek      ‎1.0     1         
             MK      ‎1.0     1         
           MOP$      ‎1.0     1         
             MT      ‎0.0     1         
            NT$      ‎1.0     1         
            Nfk      ‎1.0     1         
            Nu.      ‎1.0     1         
              P      ‎0.0     1         
              Q      ‎0.0     1         
              R      ‎0.0     1         
             R$      ‎1.0     2         
            RD$      ‎1.0     1         
             RF      ‎0.0     1         
             RM      ‎1.0     1         
             Rf      ‎1.0     1         
             Rp      ‎1.0     1         
             R₣      ‎0.0     1         
            S/.      ‎1.0     1         
            SGD      ‎1.0     1         
            SPL      ‎0.0     1         
             T$      ‎1.0     1         
             TL      ‎1.0     1         
            TSh      ‎1.0     1         
            TT$      ‎1.0     1         
             UM      ‎1.0     1         
            US$      ‎1.0     1         
            USD      ‎1.0     1         
            USh      ‎1.0     1         
             VT      ‎1.0     1         
             Z$      ‎1.0     1         
             ZK      ‎1.0     1         
          franc      ‎0.0     1         
         francs      ‎0.0     1         
             kn      ‎1.0     1         
             kr      ‎1.0     1         
            lei      ‎1.0     1         
           null      ‎0.0     1         
              q      ‎0.0     1         
            yen      ‎0.0     1         
             zl      ‎0.0     1         
             zł      ‎1.0     1         
              ¢      ‎0.0     1         
              £      ‎1.0     2         
              ƒ      ‎1.0     2         
             ЅM      ‎0.0     1         
            Дин      ‎0.0     1         
            ден      ‎0.0     1         
             лв      ‎1.0     1         
           руб.      ‎1.0     1         
              ֏      ‎1.0     1         
              ؋      ‎1.0     1         
           ج.س.      ‎0.0     1         
             د.      ‎0.0     1         
            د.إ      ‎0.0     2         
            د.ت      ‎0.0     1         
            د.ك      ‎0.0     1         
             دج      ‎0.0     1         
            ع.د      ‎0.0     1         
            ل.د      ‎0.0     1         
          نافكا      ‎0.0     1         
              ৳      ‎1.0     1         
              ლ      ‎0.0     1         
             ብር      ‎0.0     1         
            ናቕፋ      ‎0.0     1         
              ៛      ‎1.0     1         
              ₡      ‎1.0     1         
              ₣      ‎1.0     1         
              ₦      ‎1.0     1         
              ₨      ‎1.0     1         
              ₪      ‎1.0     1         
              ₫      ‎1.0     1         
              €      ‎1.0     5         
              ₭      ‎1.0     1         
              ₮      ‎1.0     1         
              ₱      ‎1.0     1         
              ₴      ‎1.0     1         
              ₹      ‎1.0     1         
              ₺      ‎1.0     1         
              ₼      ‎1.0     1         
              ₽      ‎1.0     1         
              ₿      ‎1.0     1         
              元      ‎0.0     1         
              ﷼      ‎1.0     1         

Global accuracy: 0.7117

Total processing time: 3.03 ms
Processing time per sample: 0.018604 ms

I'm open to suggestions

kmike commented 2 years ago

hey @ivsanro1! It seems in most cases the support is 1. What do you think about considering those as test cases instead? It seems these examples are very similar to the existing test cases we have (https://github.com/scrapinghub/price-parser/blob/master/tests/test_price_parsing.py), but with more variety for currency.

kmike commented 2 years ago

My main worry about these things is the workflow. If we go with an evaluation approach, we should figure out how contributors should be using it. Probably we should run it on CI. But it's less clear to me what to do with the results.

ivsanro1 commented 2 years ago

What do you think about considering those as test cases instead? It seems these examples are very similar to the existing test cases we have

Hey @kmike , thanks for the suggestion! I also thought about this, but how we'd handle the failing cases? Should we add them in a different test and mark is as xfail? I think it would be nice to easily add cases that we know we fail until they're fixed.

If we go with an evaluation approach, we should figure out how contributors should be using it

Good point. I also think putting it in CI would be the best option if we decide to go with the evaluation and not move it to tests. My original idea was to use it merely as info, or just check that the global accuracy didn't go down with every contribution - but it might be too overkill. However, I like more the tests idea, and maybe I'll just add the failing cases in a different test marked as xfail until they get fixed. What do you think?

kmike commented 2 years ago

However, I like more the tests idea, and maybe I'll just add the failing cases in a different test marked as xfail until they get fixed. What do you think?

This sounds good! That's exactly how the current testing suite is working. In the current testing suite each example is a separate test (via pytest.mark.parametrize); the ones which are failing are marked as xfail.

The existing testing suite was actually created in a similar way - a few hundred example input strings from a dataset were annotated and converted to tests. Then, there was a round of improving the parsing rules, and after that another set of examples was added, to control the overfitting (mostly to get an impression, that's not very scientific).

kmike commented 2 years ago

Unlike standard ML problems, I'm not too worried about overfitting here. I.e. it seems to be fine to fix xfail tests one-by-one, as in most cases it'd just mean adding more aliases for currencies. It still could be good to do a second round afterwards, e.g. to confirm that the accuracy improves after the first round of fixes.

kmike commented 2 years ago

By the way, in the current testing suite there is a lot of overlapping or similar examples; that's because many websites show prices in a similar way. So, it's more like a dataset, not like a hand-crafted testing suite, with a few exceptions. This allows to make trade-off decisions: it could be the case that a rule may fix some examples and break others; we can use the test count to help us decide which way to go in such situations.

ivsanro1 commented 2 years ago

That's exactly how the current testing suite is working. In the current testing suite each example is a separate test (via pytest.mark.parametrize); the ones which are failing are marked as xfail.

I see! Thanks for the explanation :+1: Then, what I'll do instead of adding this evaluation script is to add all the examples as tests, and those that fail add them in PRICE_PARSING_EXAMPLES_XFAIL until they're fixed. I like more this method since it's easier to document in the tests each of the cases.

Please feel free to close the PR if you consider it appropiate

Gallaecio commented 2 years ago

You can also implement the new approach in this PR if you want, instead of creating a separate one.

ivsanro1 commented 2 years ago

Those were the last tests I wanted to add for this PR. I was also thinking to use this PR to implement these cases and then move the tests from xfail to pass, but it might take some time. Should we merge it and add the cases to price-parser in another PR @kmike @Gallaecio ?

Gallaecio commented 2 years ago

Should we merge it and add the cases to price-parser in another PR?

Sounds good to me.

kmike commented 2 years ago

Before merging, could you please ensure that all tests pass @ivsanro1? I think some of them fail currently.

ivsanro1 commented 2 years ago

Before merging, could you please ensure that all tests pass @ivsanro1?

Yes, of course! I made the changes in 7529b2e52ceec51fe5d98b65ab8223cfcfad25be

These mypy errors were showing:

price_parser/_currencies.py:30: error: syntax error in type comment
price_parser/_currencies.py:1731: error: syntax error in type comment
price_parser/_currencies.py:1732: error: syntax error in type comment
price_parser/_currencies.py:1733: error: syntax error in type comment

I didn't see any particular problems with these lines. I saw we could update mypy from ver 0.761 to 0.982, and after updating it, it showed a couple of type errors in the tests I added (now fixed).

Also, I added the -v flag to mypy in tox.ini so it would report the errors (it wasn't showing them before)

Now everything is passing:

Success: no issues found in 5 source files
_________________________________________________________________________ summary _________________________________________________________________________
  py36: commands succeeded
  py37: commands succeeded
  py38: commands succeeded
  mypy: commands succeeded
  congratulations :)

Gallaecio commented 2 years ago

It may make sense to look into addressing https://github.com/scrapinghub/price-parser/issues/52 before merging here.

kmike commented 2 years ago

Thanks @ivsanro1!

scrapinghub / price-parser

Add currency tests #53