Open BobbyClouser opened 5 years ago
I have the same problem
I found that on Linux I don’t have the crash problem.
From: taoxinyi [mailto:notifications@github.com] Sent: Monday, January 7, 2019 3:01 AM To: infoscout/weighted-levenshtein Cc: BobbyClouser; Author Subject: Re: [infoscout/weighted-levenshtein] Jupyter notebook crashes when using dam_lev with transpose_costs (#16)
I have the same problem
— You are receiving this because you authored the thread. Reply to this email directly, HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_infoscout_weighted-2Dlevenshtein_issues_16-23issuecomment-2D451851356&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=kIHvOsb5n80tyQtWnELhjaIXTSMeQBwlracXqvy-vj8&m=wtqNBXizE0b2NJ8LiQkgrXFB3uFJ34M-wnrCivaI7NA&s=VGUvyy2qwYXPsiLPNg88Y7lEaz0nx9IU7Pj88EqZmVA&e="view it on GitHub, or HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_Aq1PkyLvAYWxHvPYPxFu7m-5Fc6SbAPnF6ks5vAv7RgaJpZM4YXuD9&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=kIHvOsb5n80tyQtWnELhjaIXTSMeQBwlracXqvy-vj8&m=wtqNBXizE0b2NJ8LiQkgrXFB3uFJ34M-wnrCivaI7NA&s=6Yj7ZD7MO71OGb3bvPi1WEXxgb6z9QXqUq0S1h6Ey_g&e="mute the thread. https://github.com/notifications/beacon/Aq1Pk4BHW3jwY5ChOreD0vwzlpoRJoW3ks5vAv7RgaJpZM4YXuD9.gif
weighted-levenshtein
was originally built to be run in a Linux environment, so although it's disappointing that it doesn't work on Windows it doesn't come as a huge surprise to me.
I'm not sure that we'll be able to look into the issue anytime soon, but if you are able to discover the issue, we would certainly take a look at a PR. 😄
weighted-levenshtein
was originally built to be run in a Linux environment, so although it's disappointing that it doesn't work on Windows it doesn't come as a huge surprise to me.I'm not sure that we'll be able to look into the issue anytime soon, but if you are able to discover the issue, we would certainly take a look at a PR. 😄
The problem might be with the line endings. Linux uses \n, while Windows uses \r\n. @RevolutionTech Are line endings used in any way in the Damerau code logic?
This should solve it. The problem was caused by negative indexing which caused the memory error on Windows. I presume this didn't cause a crash on Linux and that's why it could work? Nevertheless, I didn't test it on Linux, hopefully it works there too.
I'm using dam_lev in a jupyter notebook (5.4.0). Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]. My OS is Windows 10. Running the code below, I get the error:
Process finished with exit code -1073741819 (0xC0000005).
I looked this up and it is an access violation (memory?). The code works fine in Linux and if I don't use transpose_costs it runs in windows also. I've checked that I have all of the required versions of numpy and cython.
Would you suggest anything? Build it myself? Use it in cython?
Thanks, Bob
============================================================= import numpy as np from weighted_levenshtein import lev, osa, dam_lev
ins_costs = np.ones(128, dtype=np.float64) del_costs = np.ones(128, dtype=np.float64) sub_costs = np.ones((128, 128), dtype=np.float64) tp_costs = np.ones((128, 128), dtype=np.float64)
insert costs that should be nearly free
ins_costs[ord('-')] = 0.1 ins_costs[ord('%')] = 0.1 ins_costs[ord(' ')] = 0.1 ins_costs[ord('.')] = 0.1 ins_costs[ord('/')] = 0.1 ins_costs[ord('#')] = 0.1 ins_costs[ord('&')] = 0.1 ins_costs[ord('(')] = 0.1 ins_costs[ord(')')] = 0.1 ins_costs[ord('+')] = 0.1 ins_costs[ord('?')] = 0.1 ins_costs[ord(',')] = 0.1 ins_costs[ord("'")] = 0.1
insert costs that should be nearly free
del_costs[ord('-')] = 0.1 del_costs[ord('%')] = 0.1 del_costs[ord(' ')] = 0.1 del_costs[ord('.')] = 0.1 del_costs[ord('/')] = 0.1 del_costs[ord('#')] = 0.1 del_costs[ord('&')] = 0.1 del_costs[ord('(')] = 0.1 del_costs[ord(')')] = 0.1 del_costs[ord('+')] = 0.1 del_costs[ord('?')] = 0.1 del_costs[ord(',')] = 0.1 del_costs[ord("'")] = 0.1
substitutions that should cost less than 1
sub_costs[ord('C'), ord('S')] = 0.5 sub_costs[ord('S'), ord('C')] = 0.5
sub_costs[ord('O'), ord('0')] = 0.1 sub_costs[ord('0'), ord('O')] = 0.1
transpositions that should cost less than 1
tp_costs[ord('I'), ord('E')] = 0.1 tp_costs[ord('E'), ord('I')] = 0.1
tp_costs[ord('A'), ord('E')] = 0.2 tp_costs[ord('E'), ord('A')] = 0.2
print(dam_lev('ABNANA', 'BANANA', transpose_costs=tp_costs, substitute_costs=sub_costs, insert_costs=ins_costs, delete_costs=del_costs))