saketkc / pysradb

Package for fetching metadata and downloading data from SRA/ENA/GEO
https://saketkc.github.io/pysradb
BSD 3-Clause "New" or "Revised" License
311 stars 51 forks source link

Python traceback when sending output to a pipe #7

Closed s-andrews closed 4 years ago

s-andrews commented 5 years ago

Description

$ pysradb search '"ribosome profiling"' | head

Generated:

study_accession experiment_accession sample_accession run_accession
 DRP003075       DRX019536            DRS026974        DRR021383
 DRP003075       DRX019537            DRS026982        DRR021384
 DRP003075       DRX019538            DRS026979        DRR021385
 DRP003075       DRX019540            DRS026984        DRR021387
 DRP003075       DRX019541            DRS026978        DRR021388
 DRP003075       DRX019543            DRS026980        DRR021390
 DRP003075       DRX019544            DRS026981        DRR021391
 ERP013565       ERX1264364           ERS1016056       ERR1190989
 ERP013565       ERX1264365           ERS1016057       ERR1190990
Traceback (most recent call last):
  File "/bi/apps/python/3.5.1/bin/pysradb", line 10, in <module>
    sys.exit(parse_args())
  File "/bi/apps/python/3.5.1/lib/python3.5/site-packages/pysradb/cli.py", line 944, in parse_args
    args.saveto,
  File "/bi/apps/python/3.5.1/lib/python3.5/site-packages/pysradb/cli.py", line 150, in search
    _print_save_df(df, saveto)
  File "/bi/apps/python/3.5.1/lib/python3.5/site-packages/pysradb/cli.py", line 38, in _print_save_df
    print(df.to_string(index=False, justify="left", col_space=0))
BrokenPipeError: [Errno 32] Broken pipe

It doesn't happen with all searches:

$ pysradb search '"oocyte development"' | head
study_accession experiment_accession sample_accession run_accession
 SRP011546       SRX129998            SRS300732        SRR445719
 SRP011546       SRX129999            SRS300733        SRR445720
 SRP064741       SRX1617410           SRS1326799       SRR3208744
 SRP064741       SRX1617411           SRS1326798       SRR3208745
 SRP064741       SRX1617412           SRS1326797       SRR3208746
 SRP064741       SRX1617413           SRS1326796       SRR3208747
 SRP064741       SRX1617414           SRS1326795       SRR3208748
 SRP064741       SRX1617415           SRS1326794       SRR3208749
 SRP064741       SRX1617416           SRS1326793       SRR3208750

I suspect the program doesn't cope with having to wait to generate output if the pipe buffer is full, or closed?

saketkc commented 5 years ago

Thanks Simon!

I can reproduce this at my end. I think I have a fix for this that I'll update soon.

jarrodscott commented 5 years ago

is this still an issue? I ask because I received the same error when running:

pysradb search '"ribosome profiling"' | head
VangelisTheodorakis commented 4 years ago

I also get the broken pipe error.

saketkc commented 4 years ago

This is still an issue. Sorry, but I don't have a fix yet. As a temporary fix, you can save the output to a tsv by appending --saveto myfile.tsv and then do a head on the tsv.

nimitbhardwaj commented 4 years ago

I would like to work this issue. I did some research why this issue is happening, I came to know that in file cli.py, in function _print_save_df, line number 33, we are printing the output what is needed to be printed. By referring stackoverflow, the Broken Pipe Error occurs because when one end of the pipe is blocked, and for our case the I/O redirection by | head is blocking the output pipe, so the buffer of the stdout pipe overflows and is broken, this is because the head or output redirection only reads when the preceding process is completed. Following the same question thread, I came to know that ignoring the exception is best way to do it, as at some point when the buffer is full it wont take any other byte in buffer, so it can be done by exception handling. For this approach it is easy to code, but I want to confirm will this approach be good for our case.

saketkc commented 4 years ago

Hi @nimitbhardwaj, thanks for looking into this. If you think the fix could potentially work, please do submit a pull request. The test cases for cli.py are currently disabled, but if you submit a PR, it would be good to reinstate them along with this new test case.

nimitbhardwaj commented 4 years ago

@saketkc ok cool I make the PR for it today and will re initiate the testcase for the same.

DaasDaham commented 4 years ago

Is this issue still open? I was thinking on the same lines as nimit, basically, this error is thrown due to the large size of the file which is being redirected through stdout to pipe. The pipe typically has a limit of ~60KB as mentioned in this answer's comment. So to make it user-friendly a hard-coded limit on the size of to_print list (line 43 in cli.py) can be put to restrict it from printing anything out to the terminal if it crosses the threshold. When it is greater than the limit the output can be saved to a unique .tsv file and an error message output is too large may be displayed along with the name of tsv file. This approach does not exactly solve the Broken pipe error but is merely a workaround. @saketkc If this issue is still open, is the approach mentioned above fine? Also should I write a few test cases to test it?