infoscout / weighted-levenshtein

Weighted Levenshtein library
MIT License
105 stars 26 forks source link

Jupyter notebook crashes when using dam_lev with transpose_costs #16

Open BobbyClouser opened 5 years ago

BobbyClouser commented 5 years ago

I'm using dam_lev in a jupyter notebook (5.4.0). Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]. My OS is Windows 10. Running the code below, I get the error:

Process finished with exit code -1073741819 (0xC0000005).

I looked this up and it is an access violation (memory?). The code works fine in Linux and if I don't use transpose_costs it runs in windows also. I've checked that I have all of the required versions of numpy and cython.

Would you suggest anything? Build it myself? Use it in cython?

Thanks, Bob

============================================================= import numpy as np from weighted_levenshtein import lev, osa, dam_lev

ins_costs = np.ones(128, dtype=np.float64) del_costs = np.ones(128, dtype=np.float64) sub_costs = np.ones((128, 128), dtype=np.float64) tp_costs = np.ones((128, 128), dtype=np.float64)

insert costs that should be nearly free

ins_costs[ord('-')] = 0.1 ins_costs[ord('%')] = 0.1 ins_costs[ord(' ')] = 0.1 ins_costs[ord('.')] = 0.1 ins_costs[ord('/')] = 0.1 ins_costs[ord('#')] = 0.1 ins_costs[ord('&')] = 0.1 ins_costs[ord('(')] = 0.1 ins_costs[ord(')')] = 0.1 ins_costs[ord('+')] = 0.1 ins_costs[ord('?')] = 0.1 ins_costs[ord(',')] = 0.1 ins_costs[ord("'")] = 0.1

insert costs that should be nearly free

del_costs[ord('-')] = 0.1 del_costs[ord('%')] = 0.1 del_costs[ord(' ')] = 0.1 del_costs[ord('.')] = 0.1 del_costs[ord('/')] = 0.1 del_costs[ord('#')] = 0.1 del_costs[ord('&')] = 0.1 del_costs[ord('(')] = 0.1 del_costs[ord(')')] = 0.1 del_costs[ord('+')] = 0.1 del_costs[ord('?')] = 0.1 del_costs[ord(',')] = 0.1 del_costs[ord("'")] = 0.1

substitutions that should cost less than 1

sub_costs[ord('C'), ord('S')] = 0.5 sub_costs[ord('S'), ord('C')] = 0.5

sub_costs[ord('O'), ord('0')] = 0.1 sub_costs[ord('0'), ord('O')] = 0.1

transpositions that should cost less than 1

tp_costs[ord('I'), ord('E')] = 0.1 tp_costs[ord('E'), ord('I')] = 0.1

tp_costs[ord('A'), ord('E')] = 0.2 tp_costs[ord('E'), ord('A')] = 0.2

print(dam_lev('ABNANA', 'BANANA', transpose_costs=tp_costs, substitute_costs=sub_costs, insert_costs=ins_costs, delete_costs=del_costs))

taoxinyi commented 5 years ago

I have the same problem

BobbyClouser commented 5 years ago

I found that on Linux I don’t have the crash problem.

 

From: taoxinyi [mailto:notifications@github.com] Sent: Monday, January 7, 2019 3:01 AM To: infoscout/weighted-levenshtein Cc: BobbyClouser; Author Subject: Re: [infoscout/weighted-levenshtein] Jupyter notebook crashes when using dam_lev with transpose_costs (#16)

 

I have the same problem

— You are receiving this because you authored the thread. Reply to this email directly, HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_infoscout_weighted-2Dlevenshtein_issues_16-23issuecomment-2D451851356&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=kIHvOsb5n80tyQtWnELhjaIXTSMeQBwlracXqvy-vj8&m=wtqNBXizE0b2NJ8LiQkgrXFB3uFJ34M-wnrCivaI7NA&s=VGUvyy2qwYXPsiLPNg88Y7lEaz0nx9IU7Pj88EqZmVA&e="view it on GitHub, or HYPERLINK "https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_Aq1PkyLvAYWxHvPYPxFu7m-5Fc6SbAPnF6ks5vAv7RgaJpZM4YXuD9&d=DwMCaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=kIHvOsb5n80tyQtWnELhjaIXTSMeQBwlracXqvy-vj8&m=wtqNBXizE0b2NJ8LiQkgrXFB3uFJ34M-wnrCivaI7NA&s=6Yj7ZD7MO71OGb3bvPi1WEXxgb6z9QXqUq0S1h6Ey_g&e="mute the thread. https://github.com/notifications/beacon/Aq1Pk4BHW3jwY5ChOreD0vwzlpoRJoW3ks5vAv7RgaJpZM4YXuD9.gif

RevolutionTech commented 5 years ago

weighted-levenshtein was originally built to be run in a Linux environment, so although it's disappointing that it doesn't work on Windows it doesn't come as a huge surprise to me.

I'm not sure that we'll be able to look into the issue anytime soon, but if you are able to discover the issue, we would certainly take a look at a PR. 😄

LEFTazs commented 4 years ago

weighted-levenshtein was originally built to be run in a Linux environment, so although it's disappointing that it doesn't work on Windows it doesn't come as a huge surprise to me.

I'm not sure that we'll be able to look into the issue anytime soon, but if you are able to discover the issue, we would certainly take a look at a PR. 😄

The problem might be with the line endings. Linux uses \n, while Windows uses \r\n. @RevolutionTech Are line endings used in any way in the Damerau code logic?

LEFTazs commented 4 years ago

This should solve it. The problem was caused by negative indexing which caused the memory error on Windows. I presume this didn't cause a crash on Linux and that's why it could work? Nevertheless, I didn't test it on Linux, hopefully it works there too.