Open rthaenert opened 4 years ago
This would be a great feature to have. Note that it's a bit tricky to implement, as relation between currency symbols and currency ISO codes is N:1, so we'll need to use other attribute like country to determine if $ means USD, AUD, HKD, SGD or other.
Yes, there are many cases in which this mapping would result in more than one currency code.
Maybe its a good idea to provide all matching currency codes with the first element always being a major currency code (the most used ones, basically the ones outlined in https://en.wikipedia.org/wiki/Currency_pair)?
Like this:
€ => [ "EUR" ]
$ => [ "USD", "AUD", "CAD", ...]
AU$ => [ "AUD" ]
[...]
To decide between the different $'s the existing currency hint could be reused to get a precise mapping and for all cases in which it's unclear the list with all possible values should be good enough.
What do you think?
That's an interesting option which I didn't consider before. That would mean that the caller which has more info regarding the context would be able to select the best variant. And the caller which does not care much could take all or first. So it seems that this approach can work well. :+1: Also this looks quite future-proof to me.
An alternative/complementary approach would be Dateparser’s, where users pass a locale to the parser, and the parser returns a value based on the specified locale.
@Gallaecio @lopuhin Is there any update regarding this feature? If it's in development, would love to contribute. :)
I don’t think there is anyone working on it at the moment.
As suggest @Gallaecio it will be nicer if every locale will be able to redefine currency symbols.
If no one work on this, I can start work on this issue.
There’s no pull request open so far, so feel free to go ahead.
FWIW List of circulating currencies: https://en.wikipedia.org/wiki/List_of_circulating_currencies and the support of currencies and locales in Babel: http://babel.pocoo.org/en/latest/api/numbers.html
There's a current implementation of this that I could add via PR.
This implementation works as follows:
$
, US$
), it makes a fuzzy search (using python-Levenshtein
) to select the best matching currency in a database.$
we'd have ['USD', 'CAD', 'AUD', ...]
, but for US$
we'd only have ['USD']
as candidates.The steps 1 and 2 could be added to price-parser
, and it would not require further input from the user, i.e. it would not change the API:
>>> Price.fromstring('1200 $')
Price(amount=Decimal('1200'), currency='$', currency_codes=['USD', 'CAD', 'AUD', ...])
>>> Price.fromstring('1200 US$')
Price(amount=Decimal('1200'), currency='US$', currency_codes=['USD'])
The step 3 is a little more tricky, as it would require more inputs from the user.
Some examples of how the API could be:
# `hint_text` would be intended to use mainly with plain HTML
Price.fromstring('1200 $', hint_text='<html><body>... currency="USD"...</body></html>')
Price(amount=Decimal('1200'), currency='$', currency_codes=['USD'])
Price.fromstring('1200 $', hint_url='www.example.ca')
Price(amount=Decimal('1200'), currency='$', currency_codes=['CAD'])
However, in my opinion, this is beyond the scope of price-parser
, I'd go for integrating only 1 and 2, and the user would have its own way of selecting from the candidates list, as @lopuhin pointed out, since they'd have more context about their problem.
Additionally, I wanted to point out that price-parser
sometimes does not find the currency, especially when it's not "standard", here are some examples:
>>> Price.fromstring('1200 SFr') # SFr is Swiss Franc. Currency code: CHF
Price(amount=Decimal('1200'), currency=None)
>>> Price.fromstring('1200 kz') # "kz" is Angolan Kwanza. Currency code: AOA
Price(amount=Decimal('1200'), currency=None)
>>> Price.fromstring('دينار 1000') # "دينار" is Bahraini dinar. Currency code: BHD
Price(amount=Decimal('1000'), currency=None)
>>> Price.fromstring('1000 BTC') # "BTC" is Bitcoin. Currency code: BTC, although not part of ISO 4217, but widely adopted
Price(amount=Decimal('1000'), currency=None)
So, unfortunately, the fuzzy search won't be so useful, as it's intended for when the currency can be less standard, and for finding currencies in a more robust way. The drawback of it is obvious: it can find wrong matches, especially because we don't use a similarity threshold to define "far matches" that should not be used.
We have three options here:
python-Levenshtein
would not be a dependencyprice-parser
as it is. In some cases this feature could still be useful, but less than it could since price-parser would not find less typical currencies.currency
with the current method, it tries to make a heuristic search (we could even use the fuzzy search for this and kill two birds with one stone)Thank you @ivsanro1 , an early comment on one point of your proposal
However, in my opinion, this is beyond the scope of price-parser, I'd go for integrating only 1 and 2, and the user would have its own way of selecting from the candidates list
To me it disambiguation also looks useful, as price parser is probably often used in web data extraction context, when these hints make sense. In terms of the API, it could be the same, but the list of currencies could be smaller.
Also regarding the API, if we add the currency_codes
attribute to Price
, it also makes sense to add a currency_code
property which would be non-empty in case this list has one element, to simplify the usage.
@ivsanro1 regarding your last question,
Keep the feature and make price-parser find less typical currencies.
Looks best to me, but this can also be a different issue and a different PR. Even in current state the fuzzy matching looks useful as we can pass the currency_hint
to Price.fromstring
.
Hey! Could you please elaborate, why is fuzzy search needed here? I wonder if it'd be better to hardcode more currency variations. Or is it problematic for some reason?
Fuzzy search is only needed if we want to allow for non-exact matches. However, hardcoding the variations is also a perfectly valid approach and we would not have to worry about false positives (or at least as many as we could potentially have with fuzzy search).
In any case, it's slightly unrelated for the currency_code
(sorry for that), I just mentioned it because it's related with the implementation I was describing.
My use is to use price-parser with Stripe amount and currency and it requires 3 digit ISO currency code instead of 2 digit $. https://docs.stripe.com/currencies?presentment-currency=MX
Right now I have to use this code.
def fix_currency(currency):
# TODO: Use this gist https://gist.github.com/jylopez/ba16be2ae55282d5cff07de65128de83
if currency == "MX$":
return currency.replace("MX$", "MXN")
elif currency == "C$":
return currency.replace("C$", "CAD")
elif currency == "$":
return currency.replace("$", "USD")
else:
return currency
First of all: Nice library, thanks for creating it.
For converting between major currencies it would be nice to have the ISO 4217 code of the parsed price (EUR, USD, AUD, ...) as this is easier for handling exchange rates.
Is there any plan to support that?