Open Michelle-Pena opened 1 year ago
can you give more detail on what reproduces the error? and also what version of biopython you have installed. If it is an old version and NCBI requires https
rather than http
that could be the problem.
but either way some detail on what you did to see this error as well as what exactly the error reports so the line that is failing can be found.
Hi sorry for taking so long to reply. These is the version I have: Biopython/1.81-foss-2020b
This is the complete message I'm getting on the output file:
Traceback (most recent call last):
File "./biosample2table.py", line 160, in
@hyphaltip I had this issue as well, same error stack. I mostly got it working by upgrading both biopython and numpy (1.81 -> 1.83 and 1.25 -> 1.26.4). I think original issue predates apple silicon chip problems, but if it matters I'm using an M1 chip which I know does not play well with biopython.
For some reason, this fails intermittently. I could never get my list of 84 biosamples to go, so as a workaround I used a simple bash for loop to run iteratively. 3/84 failed with a similar error stack above, so I reran those individually and they worked.
Below is my command and some demonstrative output.
❯ for biosam in $(cat biosamples_n84.list); do CMD="biosample2table.py -s $biosam --out biosamples_n84.csv -e nc.cauldron@gmail.com"; echo $CMD; eval $CMD; done
biosample2table.py -s SAMN19689572 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689611 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689620 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689549 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689560 --out biosamples_n84.csv -e nc.cauldron@gmail.com
biosample2table.py -s SAMN19689622 --out biosamples_n84.csv -e nc.cauldron@gmail.com
Traceback (most recent call last):
File "/Users/nicholascauldron/opt/bin/biosample2table.py", line 160, in <module>
handle = Entrez.efetch(db="biosample", id=sampid)
File "/Users/nicholascauldron/Library/Python/3.9/lib/python/site-packages/Bio/Entrez/__init__.py", line 197, in efetch
return _open(request)
File "/Users/nicholascauldron/Library/Python/3.9/lib/python/site-packages/Bio/Entrez/__init__.py", line 623, in _open
handle = urlopen(request)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 561, in error
return self._call_chain(*args)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
❯ biosample2table.py -s SAMN19689622 --out biosamples_n84.csv -e nc.cauldron@gmail.com
❯
Thanks for the report. I will try to get back to this but seems an issue with the dependencies sadly.
I think so too, but the intermittent failures was odd. Maybe I was hitting some rate limit for entrez requests?
Regardless, I'm alright with the workaround for now and it's not too painful.
I have also encountered the same HTTP Error: 400 error. Currently, my solution is to retry efetch when it returns a 400 error. So far, this solution seems to be working well.
I am now using this enough it was worth it to me to automate the retry process. Here's the bash code @ilanqing , it uses a simple recursion where the exit condition is based on the exit code of the python script. I don't know/remember the exit code produced by the http error, but it doesn't matter much because only a successful run produces exit code 0.
function biosam2tbl()
{
sra="$1"
efetch_out_name="$2"
CMD="biosample2table.py -s $sra --out ${efetch_out_name}.biosam_info.csv --sra -e email@gmail.com"
echo $CMD >> ${efetch_out_name}.biosam_info.log
eval $CMD &>> ${efetch_out_name}.biosam_info.log
# if error, run again
if [ $? != 0 ]; then
biosam2tbl "$sra" "$efetch_out_name"
fi }
#echo "Getting info from biosample corresponding to SRA accessions. Be patient, takes up to 3 sec. per sample"
for sra in $(cat ${efetch_out_name}.sra.list); do
biosam2tbl "$sra" "$efetch_out_name"
done
Yes, my thoughts are basically the same as yours, although there are slight differences in the implementation. Thank you @Neato-Nick , and here are my two code samples for your reference. two python code.zip
Hi, I'm trying to run your code and I'm getting this error: urllib.error.HTTPError: HTTP Error 400: Bad Request I would appreciate any help on how to fix it. Thanks