ylab-hi / pxblat

PxBLAT: An Efficient and Ergonomic Python Binding Library for BLAT
https://pxblat.readthedocs.io/en/latest/
Other
17 stars 0 forks source link

[Bug]: gfClient error exit #245

Open anderdnavarro opened 8 months ago

anderdnavarro commented 8 months ago

What happened?

Hi!

I have a list of 2000 sequences. I am trying to align them using pxblat, if I run only 100-300 sequences it works well, but when I try to do all of them I get the following error:

python3: TCP non-blocking connect() to localhost IP  timed-out in select() after 10000 milliseconds - Cancelling!: Operation now in progress
python3: # error: Operation now in progress
Sorry, the BLAT/iPCR server seems to be down.  Please try again later: localhost 5000
: Operation now in progress
gfClient error exit

I tried to automatically split them into different batches, even adding 25s sleep between rounds. I also used both Context and General modes (stoping the server each round or not), but I always obtain the same error. I could run the script several times manually with different sequences (instead of generating the batches inside the script), but I couldn't automatize it in that case.

This is the code I was using:

client = Client(
        host="localhost",
        port=5000,
        seq_dir="/databases/hg38",
        min_score=20,
        min_identity=90
)

with Server("localhost", 5000, "/databases/hg38/hg38.2bit", can_stop=True, step_size=5, tile_size=10) as server: 
        sequences:list = prepare_blat_sequences(file)
        server.wait_ready()  
        results = client.query(sequences[0:2000])

I don't know if there would be an extra Server or Client option that I am missing that I could use.

Thank you very much! Ander

Version

python-3.10.12 pxblat-1.1.10 biopython-1.83

What platform are you working on?

No response

Relevant log output

No response

Code of Conduct

cauliyang commented 8 months ago

good finding! I will dive into the issue and try to resolve that soon

anderdnavarro commented 8 months ago

I ran the same 2000 sequences with the previous version I had installed (pxblat 0.3.6) and the error said: *** buffer overflow detected ***: terminated

In case it is useful for you! Thanks!

cauliyang commented 8 months ago

Thanks for the information, and I guess the issue is related to the previous issue https://github.com/ylab-hi/pxblat/issues/66. Could you please use ulimit -n 2048 to set the maximum limit of open files/connections? Let's see the method will fix the issue.

anderdnavarro commented 8 months ago

I tried to use ulimit -n 2048 and ulimit -n 4096, but none of them worked. Same error.

cauliyang commented 8 months ago

thanks for the update, I will dive into the issue. Could you please install the latest version and test the issue again?

cauliyang commented 8 months ago

hi @anderdnavarro, Thanks for testing the tool! I have tested the latest version, '1.1.19', and the bug is supposed to be fixed. Before we test, let's make sure the port is already closed using pxblat server stop localhost port or the api https://pxblat.readthedocs.io/en/latest/api/stop_server.html since the port may not be closed properly if we meet some errors previously.

anderdnavarro commented 8 months ago

Hi @cauliyang , perfect! As soon as you release the new version I will try it.

Thank you again for the quick solution!

anderdnavarro commented 8 months ago

Hi @cauliyang, I tried the new 1.1.20 version, but the problem is still there. I used ulimit -n 2048 too and restarted the port before testing with the command you provided:

This is the output using VSCode terminal

python3: TCP non-blocking connect() to localhost IP  timed-out in select() after 10000 milliseconds - Cancelling!: Operation now in progress
python3: # error: Operation now in progress
Sorry, the BLAT/iPCR server seems to be down.  Please try again later: localhost 6000
: Operation now in progress
gfClient error exit

And this is the output using a regular terminal. I paste it too because it is a bit different:

getaddrinfo() failed: Device or resource busy
python3: Host localhost not found --> System error
: Device or resource busy
python3: # error: Device or resource busy
Sorry, the BLAT/iPCR server seems to be down.  Please try again later: localhost 6000
: Device or resource busy
gfClient error exit

I tried both because using VSCode, 70% of the times I get an error even for only 10 sequences, and I think it is related to the port forwarding feature it has. When I run pxblat server stop localhost port to restart the port, it is detected by the app (I am using a Linux server but working on a M1 Mac).

cauliyang commented 8 months ago

hi @anderdnavarro, thanks for sharing the info. Could you please share me with the latest code you use? I try to reproduce the issue and will resolve that soon.

anderdnavarro commented 8 months ago

Sure! This is the command I'm using:

python blat2.py -i sequences.txt -g /databases/hg38

This is the script:

import os
import click
from pxblat import Server, Client

@click.command(name="Blat")
@click.option("-i", "--input",
              type=click.Path(exists=True, file_okay=True),
              metavar="FILE",
              required = True,
              help="List of sequences (one per row)")
@click.option("-g", "--genomeDir", "genomeDir",
              type=click.Path(exists=True, file_okay=False),
              metavar="DIR",
              required = False,
              default = '/databases/hg38',
              help="Directory containing the genome files required for Blat (2.bit)")
def Blat(input, genomeDir):

    """
    pxBlat command to run many sequences at the same time
    """

    # File with sequences
    with open(input, 'r') as f:
        sequences:list = f.readlines()
    sequences = [line.rstrip('\n') for line in sequences]

    # 2bit file
    all_files:list = os.listdir(genomeDir)
    g2bit:click.Path = [file for file in all_files if file.endswith(".2bit")][0]  

    # Blat options
    ## Client
    client = Client(
        host="localhost",
        port=6000,
        seq_dir=genomeDir,
        min_score=20,
        min_identity=90
    )

    ## Server
    with Server("localhost", 6000, os.path.join(genomeDir, g2bit), can_stop=True, step_size=5, tile_size=10) as server: #BLAT WEB options
        server.wait_ready()  
        results = client.query(sequences[:10])

    print(results)

if __name__ == '__main__':
    Blat()

And the complete list of sequences can be found at the following link: https://drive.google.com/file/d/14oumMtx4NnMH95VXFqBHTUx5q3Qrhsai/view?usp=sharing

Let me know if you need anything else!

cauliyang commented 8 months ago

@anderdnavarro, sounds great! Thanks for sharing.