taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.
MIT License
301 stars 26 forks source link

Strange freezing behavior in script #11

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hi,

thanks for the package, it should be fantastic. I have a strange issue. When using line by line in python interpreter it works fine, but when I use it in a script it freezes.. with no error message and no progress past the find_near_matches line.

Thanks,

Theo

test.fasta:

">test CACCGCCCATTTCCAGCACGGAAGATAGGTTCTGGTGTGTCACCGTCCATTTCCCGAACCGGTCTCCCTCACCAGCTCGACCCACACTAGCTGTCCATCCTGAGGCGC"

my command is:

clip_at_primers.py test.fasta clip.fasta TCACCGCCCATTTCC TCCATCCTGAGGCGC 2

code:

import sys
import subprocess
from Bio import SeqIO
from fuzzysearch import find_near_matches

f = open(sys.argv[1],'r')
g = open(sys.argv[2],'w')
leftp = sys.argv[3].upper()
rightp = sys.argv[4].upper()
mm=sys.argv[5]
l=0
r=0
t=0
c=0

p0=subprocess.check_output('prg %s' %leftp, shell=True).split("\n")[:-1]
p1=subprocess.check_output('prg %s' %rightp, shell=True).split("\n")[:-1]

print "forward primer to clip:",p0
print "reverse primer to clip:",p1

nf=0
for x in SeqIO.parse(f,'fasta'):
    c=c+1
    t=t+1

    if c==10000:
        print t, "left clipped=",str(l),"right clipped=",str(r)

        c=0

    seq1=str(x.seq)

    for primer in p0:

        m=find_near_matches(primer,seq1,max_l_dist=mm)

        if m<>[]:
            q = m[-1].start #nb clip at last instance
            #print 'fwd',m,q
            l=l+1
            x.seq=x.seq[q:]

        else:
            q="no hit"
            #print primer,"not found in",str(x.id)

    for primer in p1:
        m=find_near_matches(primer,seq1,max_l_dist=mm)

        if m<>[]:
            q = m[0].end #nb clip at first instance
            r=r+1
            x.seq=x.seq[:q]
        else:
            q="no hit"

    SeqIO.write(x,g,'fasta')
print str(l),"left primers clipped",str(r),"right primers clipped"
taleinat commented 7 years ago

Hi, I'd be happy to help but I'll need more information to be able to reproduce the issue and debug:

  1. The test.fasta file
  2. The values of p0 and p1

In the meantime, as a possible workaround, you can try installing fuzzysearch without the compiled optimizations by using the following installation sequence:

  1. pip install fuzzysearch (you probably don't need this since you already have it installed)
  2. pip uninstall fuzzysearch
  3. pip install --install-option="--noexts" fuzzysearch

Later, if you want to enable the compiled optimizations again (perhaps after we find and resolve this issue), just uninstall and re-install fuzzysearch.

ghost commented 7 years ago

Hi,

test.fasta attached. The values of p0 and p1 are just the primers because there is no ambiguous bases (prg just makes a list of all possible oligos from ambiguous sequences). I will try your workaround.

Thanks,

Theo

Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571

[cid:image001.png@01D326FD.AB94BF70] [cube-small]

Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/

Deakin University CRICOS Provider Code 00113B

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

From: Tal Einat [mailto:notifications@github.com] Sent: Tuesday, 5 September 2017 8:06 PM To: taleinat/fuzzysearch fuzzysearch@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [taleinat/fuzzysearch] Strange freezing behavior in script (#11)

Hi, I'd be happy to help but I'll need more information to be able to reproduce the issue and debug:

  1. The test.fasta file
  2. The values of p0 and p1

In the meantime, as a possible workaround, you can try installing fuzzysearch without the compiled optimizations by using the following installation sequence:

  1. pip install fuzzysearch (you probably don't need this since you already have it installed)
  2. pip uninstall fuzzysearch
  3. pip install --install-option="--noexts" fuzzysearch

Later, if you want to enable the compiled optimizations again (perhaps after we find and resolve this issue), just uninstall and re-install fuzzysearch.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/taleinat/fuzzysearch/issues/11#issuecomment-327131634, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWaNcO1Z477hGTJlFXlFiPkA_2Iexks5sfR0ZgaJpZM4PMe7B.

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

ghost commented 7 years ago

Hi,

the workaround seemed to work. I also had to change my mismatch variable to integer type.

Thanks,

Theo

Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571

[cid:image001.png@01D326FF.42E0EFB0] [cube-small]

Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/

Deakin University CRICOS Provider Code 00113B

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

From: Tal Einat [mailto:notifications@github.com] Sent: Tuesday, 5 September 2017 8:06 PM To: taleinat/fuzzysearch fuzzysearch@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [taleinat/fuzzysearch] Strange freezing behavior in script (#11)

Hi, I'd be happy to help but I'll need more information to be able to reproduce the issue and debug:

  1. The test.fasta file
  2. The values of p0 and p1

In the meantime, as a possible workaround, you can try installing fuzzysearch without the compiled optimizations by using the following installation sequence:

  1. pip install fuzzysearch (you probably don't need this since you already have it installed)
  2. pip uninstall fuzzysearch
  3. pip install --install-option="--noexts" fuzzysearch

Later, if you want to enable the compiled optimizations again (perhaps after we find and resolve this issue), just uninstall and re-install fuzzysearch.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/taleinat/fuzzysearch/issues/11#issuecomment-327131634, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWaNcO1Z477hGTJlFXlFiPkA_2Iexks5sfR0ZgaJpZM4PMe7B.

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

ghost commented 7 years ago

Hi,

you can close this issue now - I have realised it was because I was passing an undefined variable instead of and integer to the max_l_dist parameter. It works now.

Thanks,

Theo

Dr Theo Allnutt Bioinformatics Research Fellow School of Medicine, Faculty of Health Waurn Ponds Campus 0352479571

[cid:image001.png@01D32712.145DC2E0] [cube-small]

Deakin University Locked Bag 20000, Geelong, VIC 3220 +61 3 524 79571 theo.allnutt@deakin.edu.aumailto:theo.allnutt@deakin.edu.au http://www.deakin.edu.au/health/faculty-research/bioinformatics-core-research-group https://bioinformatics-deakin.github.io/portal1/

Deakin University CRICOS Provider Code 00113B

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

From: Tal Einat [mailto:notifications@github.com] Sent: Tuesday, 5 September 2017 8:06 PM To: taleinat/fuzzysearch fuzzysearch@noreply.github.com Cc: bioinformatics bioinformatics@deakin.edu.au; Author author@noreply.github.com Subject: Re: [taleinat/fuzzysearch] Strange freezing behavior in script (#11)

Hi, I'd be happy to help but I'll need more information to be able to reproduce the issue and debug:

  1. The test.fasta file
  2. The values of p0 and p1

In the meantime, as a possible workaround, you can try installing fuzzysearch without the compiled optimizations by using the following installation sequence:

  1. pip install fuzzysearch (you probably don't need this since you already have it installed)
  2. pip uninstall fuzzysearch
  3. pip install --install-option="--noexts" fuzzysearch

Later, if you want to enable the compiled optimizations again (perhaps after we find and resolve this issue), just uninstall and re-install fuzzysearch.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/taleinat/fuzzysearch/issues/11#issuecomment-327131634, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATvGWaNcO1Z477hGTJlFXlFiPkA_2Iexks5sfR0ZgaJpZM4PMe7B.

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

taleinat commented 7 years ago

Hi Theo,

I'm happy that the workaround helped and that you found the bug that was causing the freeze!

Regardless, fuzzysearch should have recognized that an invalid type or value was given for the max_l_dist argument and raised an informative exception. I'll get that fixed!

taleinat commented 7 years ago

Fixed in v0.5.0.