stajichlab / biosample_metadata

Extract metadata from biosamples in NCBI
MIT License
10 stars 2 forks source link

urllib.error.HTTPError: HTTP Error 400: Bad Request #1

Open Michelle-Pena opened 1 year ago

Michelle-Pena commented 1 year ago

Hi, I'm trying to run your code and I'm getting this error: urllib.error.HTTPError: HTTP Error 400: Bad Request I would appreciate any help on how to fix it. Thanks

hyphaltip commented 1 year ago

can you give more detail on what reproduces the error? and also what version of biopython you have installed. If it is an old version and NCBI requires https rather than http that could be the problem.

but either way some detail on what you did to see this error as well as what exactly the error reports so the line that is failing can be found.

Michelle-Pena commented 1 year ago

Hi sorry for taking so long to reply. These is the version I have: Biopython/1.81-foss-2020b

This is the complete message I'm getting on the output file: Traceback (most recent call last): File "./biosample2table.py", line 160, in handle = Entrez.efetch(db="biosample", id=sampid) File "/apps/eb/Biopython/1.81-foss-2020b/lib/python3.8/site-packages/Bio/Entrez/init.py", line 196, in efetch return _open(request) File "/apps/eb/Biopython/1.81-foss-2020b/lib/python3.8/site-packages/Bio/Entrez/init.py", line 594, in _open handle = urlopen(request) File "/apps/eb/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/apps/eb/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/urllib/request.py", line 531, in open response = meth(req, response) File "/apps/eb/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/urllib/request.py", line 640, in http_response response = self.parent.error( File "/apps/eb/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/urllib/request.py", line 569, in error return self._call_chain(args) File "/apps/eb/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/urllib/request.py", line 502, in _call_chain result = func(args) File "/apps/eb/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/urllib/request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 400: Bad Request

Neato-Nick commented 7 months ago

@hyphaltip I had this issue as well, same error stack. I mostly got it working by upgrading both biopython and numpy (1.81 -> 1.83 and 1.25 -> 1.26.4). I think original issue predates apple silicon chip problems, but if it matters I'm using an M1 chip which I know does not play well with biopython.

For some reason, this fails intermittently. I could never get my list of 84 biosamples to go, so as a workaround I used a simple bash for loop to run iteratively. 3/84 failed with a similar error stack above, so I reran those individually and they worked.

Below is my command and some demonstrative output.

❯ for biosam in $(cat biosamples_n84.list); do CMD="biosample2table.py -s $biosam --out biosamples_n84.csv -e nc.cauldron@gmail.com"; echo $CMD; eval $CMD; done
biosample2table.py -s SAMN19689572 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689611 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689620 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689549 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689560 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689622 --out biosamples_n84.csv -e nc.cauldron@gmail.com
Traceback (most recent call last):
  File "/Users/nicholascauldron/opt/bin/biosample2table.py", line 160, in <module>
    handle = Entrez.efetch(db="biosample", id=sampid)
  File "/Users/nicholascauldron/Library/Python/3.9/lib/python/site-packages/Bio/Entrez/__init__.py", line 197, in efetch
    return _open(request)
  File "/Users/nicholascauldron/Library/Python/3.9/lib/python/site-packages/Bio/Entrez/__init__.py", line 623, in _open
    handle = urlopen(request)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
❯ biosample2table.py -s SAMN19689622 --out biosamples_n84.csv -e nc.cauldron@gmail.com
❯
hyphaltip commented 7 months ago

Thanks for the report. I will try to get back to this but seems an issue with the dependencies sadly.

Neato-Nick commented 7 months ago

I think so too, but the intermittent failures was odd. Maybe I was hitting some rate limit for entrez requests?

Regardless, I'm alright with the workaround for now and it's not too painful.

ilanqing commented 5 months ago

I have also encountered the same HTTP Error: 400 error. Currently, my solution is to retry efetch when it returns a 400 error. So far, this solution seems to be working well.

Neato-Nick commented 5 months ago

I am now using this enough it was worth it to me to automate the retry process. Here's the bash code @ilanqing , it uses a simple recursion where the exit condition is based on the exit code of the python script. I don't know/remember the exit code produced by the http error, but it doesn't matter much because only a successful run produces exit code 0.

function biosam2tbl()
{
        sra="$1"
        efetch_out_name="$2"
        CMD="biosample2table.py -s $sra --out ${efetch_out_name}.biosam_info.csv --sra -e email@gmail.com"
        echo $CMD >> ${efetch_out_name}.biosam_info.log
        eval $CMD &>> ${efetch_out_name}.biosam_info.log                                         
        # if error, run again
        if [ $? != 0 ]; then
                biosam2tbl "$sra" "$efetch_out_name"
        fi                                                                                       }
#echo "Getting info from biosample corresponding to SRA accessions. Be patient, takes up to 3 sec. per sample"
for sra in $(cat ${efetch_out_name}.sra.list); do
        biosam2tbl "$sra" "$efetch_out_name"
done
ilanqing commented 5 months ago

Yes, my thoughts are basically the same as yours, although there are slight differences in the implementation. Thank you @Neato-Nick , and here are my two code samples for your reference. two python code.zip