selva221724 / pypostalwin

libpostal wrapper python package for windows
MIT License
11 stars 5 forks source link

suggested improvements t- more efficient and more pythonic #7

Open quentinjs opened 1 year ago

quentinjs commented 1 year ago

These 2 functions could be significantly improved.

  1. instead of returning a list with one dict entry each, just return a single dictionary with all the entries i used the following to quickly convert them into a proper dict d = {} for r in parsedAddress: key = list(r.keys())[0] value = list(r.values())[0] d[key] = value

  2. use text = re.sub('[àáâãăäāå]', '', text) it will be much faster

And where are the other functions from the library ?

def stringToJSON(string): if not string in ['{}']: string = string.replace('{ ', '') string = string.replace('}', '') string = string.replace('"', '') string = string.split(", ") stringList = [i.split(': ') for i in string] outDictList = [] for i in stringList: outDictList.append({i[0]: i[1].rstrip().lstrip()}) return outDictList else: return {}

def removeSpeacialChars(address): b = {'≈': '', '≠': '', '>': '', '<': '', '+': '', '≥': '', '≤': '', '±': '', '*': '', '÷': '', '√': '', '°': '', '⊥': '', '~': '', 'Δ': '', 'π': '', '≡': '', '≜': '', '∝': '', '∞': '', '≪': '', '≫': '', '⌈': '', '⌉': '', '⌋': '', '⌊': '', '∑': '', '∏': '', 'γ': '', 'φ': '', '⊃': '', '⋂': '', '⋃': '', 'μ': '', 'σ': '', 'ρ': '', 'λ': '', 'χ': '', '⊄': '', '⊆': '', '⊂': '', '⊇': '', '⊅': '', '⊖': '', '∈': '', '∉': '', '⊕': '', '⇒': '', '⇔': '', '↔': '', '∀': '', '∃': '', '∄': '', '∴': '', '∵': '', 'ε': '', '∫': '', '∮': '', '∯': '', '∰': '', 'δ': '', 'ψ': '', 'Θ': '', 'θ': '', 'α': '', 'β': '', 'ζ': '', 'η': '', 'ι': '', 'κ': '', 'ξ': '', 'τ': '', 'ω': '', '∇': ''} for x, y in b.items(): address = address.replace(x, y) return address

quentinjs commented 1 year ago

Also you should dump the re-amble the exe gives and some how after a few hundred calls the buffer contains more then it should. and it starts pulling the response from the address provided a few entries go. Not sure if the exe is providing multiple answers and this is causing the problem perhaps.

quentinjs commented 1 year ago

also when initializing, you should allow the called to set the folder where the library is kept, not use the hard coded one you used.

selva221724 commented 1 year ago

Dear Quentin,

Really appreciate your contribution. Apologies for the late response. I feel like you should be a contributor to this project. I can add you to the repo and you can create a branch and work on the changes you suggested and do a PR, we can review that and merge it to the main. I feel your contribution should be showcased. Let me know what you think if not I will look after the changes you suggested.

Thanks & Regards, Tamil Selvan AV AI/ML Engineer Greater Manchester, United Kingdom LinkedIn: https://www.linkedin.com/in/selva221724/

On Tue, Mar 21, 2023 at 12:49 AM Quentin Sarafinchan < @.***> wrote:

also when initializing, you should allow the called to set the folder where the library is kept, not use the hard coded one you used.

— Reply to this email directly, view it on GitHub https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1477136960, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACD5KDOHYPOLHQO3KHM76ETW5D3KRANCNFSM6AAAAAAV73E3VM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

quentinjs commented 1 year ago

I have never done a pull, but sure I can upload my changes. Note I deliberately broke your return it will return a dict instead of a list

Cheers, Quentin J Sarafinchan, B.Sc.

On Fri, Mar 24, 2023, 08:14 Tamil Selvan @.***> wrote:

Dear Quentin,

Really appreciate your contribution. Apologies for the late response. I feel like you should be a contributor to this project. I can add you to the repo and you can create a branch and work on the changes you suggested and do a PR, we can review that and merge it to the main. I feel your contribution should be showcased. Let me know what you think if not I will look after the changes you suggested.

Thanks & Regards, Tamil Selvan AV AI/ML Engineer Greater Manchester, United Kingdom LinkedIn: https://www.linkedin.com/in/selva221724/

On Tue, Mar 21, 2023 at 12:49 AM Quentin Sarafinchan < @.***> wrote:

also when initializing, you should allow the called to set the folder where the library is kept, not use the hard coded one you used.

— Reply to this email directly, view it on GitHub < https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1477136960 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACD5KDOHYPOLHQO3KHM76ETW5D3KRANCNFSM6AAAAAAV73E3VM

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1482779556, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBDQ7FM6FDWPXLCZ33SDSLW5WM5DANCNFSM6AAAAAAV73E3VM . You are receiving this because you authored the thread.Message ID: @.***>

selva221724 commented 1 year ago

actually, there is a problem in my friend, libpostal do return the same keys multiple times, like road: "1st road" , and road: " main road", so what happens is, the dict type will remove any one of these since I kept the list from the exe to read and then convert into dict for the users.

On Sat, Mar 25, 2023 at 2:02 AM Quentin Sarafinchan < @.***> wrote:

I have never done a pull, but sure I can upload my changes. Note I deliberately broke your return it will return a dict instead of a list

Cheers, Quentin J Sarafinchan, B.Sc.

On Fri, Mar 24, 2023, 08:14 Tamil Selvan @.***> wrote:

Dear Quentin,

Really appreciate your contribution. Apologies for the late response. I feel like you should be a contributor to this project. I can add you to the repo and you can create a branch and work on the changes you suggested and do a PR, we can review that and merge it to the main. I feel your contribution should be showcased. Let me know what you think if not I will look after the changes you suggested.

Thanks & Regards, Tamil Selvan AV AI/ML Engineer Greater Manchester, United Kingdom LinkedIn: https://www.linkedin.com/in/selva221724/

On Tue, Mar 21, 2023 at 12:49 AM Quentin Sarafinchan < @.***> wrote:

also when initializing, you should allow the called to set the folder where the library is kept, not use the hard coded one you used.

— Reply to this email directly, view it on GitHub <

https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1477136960

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ACD5KDOHYPOLHQO3KHM76ETW5D3KRANCNFSM6AAAAAAV73E3VM

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1482779556 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACBDQ7FM6FDWPXLCZ33SDSLW5WM5DANCNFSM6AAAAAAV73E3VM

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1483679266, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACD5KDLRLIPVLA574QPXBB3W5ZG3FANCNFSM6AAAAAAV73E3VM . You are receiving this because you commented.Message ID: @.***>

quentinjs commented 1 year ago

Hmmmmm...... Do you have an example address or 3 that I can test that with?

Cheers, Quentin J Sarafinchan, B.Sc.

On Fri, Mar 24, 2023, 21:21 Tamil Selvan @.***> wrote:

actually, there is a problem in my friend, libpostal do return the same keys multiple times, like road: "1st road" , and road: " main road", so what happens is, the dict type will remove any one of these since I kept the list from the exe to read and then convert into dict for the users.

On Sat, Mar 25, 2023 at 2:02 AM Quentin Sarafinchan < @.***> wrote:

I have never done a pull, but sure I can upload my changes. Note I deliberately broke your return it will return a dict instead of a list

Cheers, Quentin J Sarafinchan, B.Sc.

On Fri, Mar 24, 2023, 08:14 Tamil Selvan @.***> wrote:

Dear Quentin,

Really appreciate your contribution. Apologies for the late response. I feel like you should be a contributor to this project. I can add you to the repo and you can create a branch and work on the changes you suggested and do a PR, we can review that and merge it to the main. I feel your contribution should be showcased. Let me know what you think if not I will look after the changes you suggested.

Thanks & Regards, Tamil Selvan AV AI/ML Engineer Greater Manchester, United Kingdom LinkedIn: https://www.linkedin.com/in/selva221724/

On Tue, Mar 21, 2023 at 12:49 AM Quentin Sarafinchan < @.***> wrote:

also when initializing, you should allow the called to set the folder where the library is kept, not use the hard coded one you used.

— Reply to this email directly, view it on GitHub <

https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1477136960

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ACD5KDOHYPOLHQO3KHM76ETW5D3KRANCNFSM6AAAAAAV73E3VM

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub <

https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1482779556

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ACBDQ7FM6FDWPXLCZ33SDSLW5WM5DANCNFSM6AAAAAAV73E3VM

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1483679266 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACD5KDLRLIPVLA574QPXBB3W5ZG3FANCNFSM6AAAAAAV73E3VM

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1483685024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBDQ7FLL6QFPOTOKSJ5TVDW5ZJCPANCNFSM6AAAAAAV73E3VM . You are receiving this because you authored the thread.Message ID: @.***>

quentinjs commented 1 year ago

Given most tags have singles using a base of a dict still makes sense. If only one value then address[city] would be a str else if multiple then it would be a list... Address[city][0] .....

Cheers, Quentin J Sarafinchan, B.Sc.

On Fri, Mar 24, 2023, 21:21 Tamil Selvan @.***> wrote:

actually, there is a problem in my friend, libpostal do return the same keys multiple times, like road: "1st road" , and road: " main road", so what happens is, the dict type will remove any one of these since I kept the list from the exe to read and then convert into dict for the users.

On Sat, Mar 25, 2023 at 2:02 AM Quentin Sarafinchan < @.***> wrote:

I have never done a pull, but sure I can upload my changes. Note I deliberately broke your return it will return a dict instead of a list

Cheers, Quentin J Sarafinchan, B.Sc.

On Fri, Mar 24, 2023, 08:14 Tamil Selvan @.***> wrote:

Dear Quentin,

Really appreciate your contribution. Apologies for the late response. I feel like you should be a contributor to this project. I can add you to the repo and you can create a branch and work on the changes you suggested and do a PR, we can review that and merge it to the main. I feel your contribution should be showcased. Let me know what you think if not I will look after the changes you suggested.

Thanks & Regards, Tamil Selvan AV AI/ML Engineer Greater Manchester, United Kingdom LinkedIn: https://www.linkedin.com/in/selva221724/

On Tue, Mar 21, 2023 at 12:49 AM Quentin Sarafinchan < @.***> wrote:

also when initializing, you should allow the called to set the folder where the library is kept, not use the hard coded one you used.

— Reply to this email directly, view it on GitHub <

https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1477136960

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ACD5KDOHYPOLHQO3KHM76ETW5D3KRANCNFSM6AAAAAAV73E3VM

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub <

https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1482779556

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ACBDQ7FM6FDWPXLCZ33SDSLW5WM5DANCNFSM6AAAAAAV73E3VM

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub < https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1483679266 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACD5KDLRLIPVLA574QPXBB3W5ZG3FANCNFSM6AAAAAAV73E3VM

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/selva221724/pypostalwin/issues/7#issuecomment-1483685024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBDQ7FLL6QFPOTOKSJ5TVDW5ZJCPANCNFSM6AAAAAAV73E3VM . You are receiving this because you authored the thread.Message ID: @.***>