merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
440 stars 145 forks source link

NCBI Blast search does not work #604

Closed ozcan closed 7 years ago

ozcan commented 7 years ago

@ShaiberAlon noticed and reported this problem today. In the beginning I suspected from parameters but after spending hours on it, I could not find anything. Then I started playing with the headers and found out that Origin is the one causing the problem. When Origin is https://blast.ncbi.nlm.nih.gov requests work but when it is http://127.0.0.1:8080 we get 403 Forbidden. This looked like to me a security precaution against CSRF attacks. The bad news is there is no way to modify this value but the good news is after spending bit time I tried GET instead of POST and it worked, so it seems they only enforce this new rule when method is POST.

as a result fire_up_ncbi_blast function at anvio/data/interactive/js/utils.js needs to be changed like below

                form.method = 'POST';

to

                form.method = 'GET';

I will do this change after little bit more testing. Also here I attached example requests below:

curl 'https://blast.ncbi.nlm.nih.gov/Blast.cgi' -H 'Origin: http://127.0.0.1:8080' -H 'Pragma: no-cache' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: en-US,en;q=0.8,tr;q=0.6' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'Cache-Control: no-cache' -H 'Referer: http://127.0.0.1:8080/app/index.html?rand=8e3cbb09' -H 'Connection: keep-alive' -H 'DNT: 1' --data 'PROGRAM=blastn&DATABASE=nr&QUERY=%3E204_10M_MERGED.PERFECT.gz.keep_contig_878_split_00013%0D%0AGGCGCGATGAGCGTTCGCAGGGTCATGGGCACGGAAACGGAATACGCGGTCTCCGCGCCAGGACAAGGCCGCTACAATCCGGTGCAGCTCTCATTCGACGTGGTTGGCGCCGCCTCGGATGCGTCACGCAGCCATATCCGCTGGGACTACCGGCAGGAAGACCCCGTCAACGACGCGCGCGGCACCAGACTGGAACGTGCGGCGGCCCACCCGGACCTGCTCACCGACGCCCCACAGCTCAACATCACCAACGTGATCGCCCCTAACGGGGGTCGTATCTACGTCGACCACGCGCATCCCGAATACTCCGCGCCGGAAACAACCGACCCCTTCGAAGCGGTGCGATACGACCGTGCCGGCGATCTCATCATGCGCGCCGCCGCAGCCAAGGCAAGCGAGACGACGGGACGGAAAATCGTGCTGCACCGCAACAACGTCGACGGCAAAGGAGCCAGCTGGGGCACGCACGAAAACTACATGATGCTGCGGTCGGTACCGTTCGACCTGGTGACACGCCTCATGACCACGCACTTCGTCTCCCGACAGATCTTCATCGGCTCCGGCCGCGTCGGCATCGGCGAACATAGCGAAAACGCCGGCTACCAGCTAAGCCAACGTGCCGACTACTTCCATATGAAAGTCGGTTTGCAGACCACATTCGACCGGCCGATCATCAACACCAGAGACGAATCGCACAGCACCGACGAATACCGCAGACTGCACGTGATCGTCGGCGACGCCAACCGCATGGACGTGCCACAGGCCTTGAAACTCGGCACCACCAGCATGCTGCTGTGGCTGCTCGAACACGCCGACATGGCCGAAATCGACTTGAACGAGGCGCTGGAACCGCTTAAGCTCGCCGACCCGGTGGAAGCCATGCACACGGTATCGCACGACCTGACCCTGGCCGCCCCACTACCATTGGAAACAGGTGGAACCACCACCGCATGGCAGATGCAGGTCACTTTGCGTGGACTGGTATATGCGGCCGCGGCAACGGTATACGGCACCGACACATCCGGAGAGCCGGCATGGCCCGACCGCTCCACCCGCAACATCATGGCGATGTGGGGGCAGGCTCTCGCCGACGTCGCCACAGTACGCCATGCCGACGATGACGGACGACTGACGATGCGGGAACAGGCGTCACGTCTCGAATGGCTGCTCAAATGGCAGTTGCTGGAGAAGCTACGCCGCAAGACCGGTTCGGATTGGACCGATCGGCGCCTGGCCGCGGTCGATCTGAAATGGGCCGCGCTCGATCCGGCCGATTCGATTTTCACCCGGCTTGCCGGACAGACCGAACGACTGGTGACGGATAAGCAGCTTGCCGAGGCGGTCGGCCAAGCGCCGGCCGACACACGCGCATGGCTGCGCGCGGAGATCGTACGACGTTTCCCCGAACAGGTCGTTGCGGCGTCGTGGTCCCACCTTACGGTGCGCGGCGAGTCGTCCGGTGATGAAAACGTGGAGAATTCGATGGTCTCGTTGGACATGTCCAATCCGTTGAAATTCACGGAATCCTTGTGTTCCGAAGCGTTCGAGCGTGTCCACACTGCGAGCGGAATCGTGGAATCCCTGCGTTGAATCTC&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&SHOW_OVERVIEW=on&LINK_LOC=blasthome&MAX_NUM_SEQ=100&FORMAT_NUM_ORG=1&CONFIG_DESCR=2%2C3%2C4%2C5%2C6%2C7%2C8&CLIENT=web&SERVICE=plain&CMD=request&PAGE=MegaBlast&MEGABLAST=on&WWW_BLAST_TYPE=newblast&DEFAULT_PROG=megaBlast&SELECTED_PROG_TYPE=megaBlast&SAVED_SEARCH=true&NUM_DIFFS=0&NUM_OPTS_DIFFS=0&USER_DEFAULT_PROG_TYPE=megaBlast' --compressed | grep automatically
curl 'https://blast.ncbi.nlm.nih.gov/Blast.cgi' -H 'Origin: https://blast.ncbi.nlm.nih.gov' -H 'Pragma: no-cache' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: en-US,en;q=0.8,tr;q=0.6' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'Cache-Control: no-cache' -H 'Referer: http://127.0.0.1:8080/app/index.html?rand=8e3cbb09' -H 'Connection: keep-alive' -H 'DNT: 1' --data 'PROGRAM=blastn&DATABASE=nr&QUERY=%3E204_10M_MERGED.PERFECT.gz.keep_contig_878_split_00013%0D%0AGGCGCGATGAGCGTTCGCAGGGTCATGGGCACGGAAACGGAATACGCGGTCTCCGCGCCAGGACAAGGCCGCTACAATCCGGTGCAGCTCTCATTCGACGTGGTTGGCGCCGCCTCGGATGCGTCACGCAGCCATATCCGCTGGGACTACCGGCAGGAAGACCCCGTCAACGACGCGCGCGGCACCAGACTGGAACGTGCGGCGGCCCACCCGGACCTGCTCACCGACGCCCCACAGCTCAACATCACCAACGTGATCGCCCCTAACGGGGGTCGTATCTACGTCGACCACGCGCATCCCGAATACTCCGCGCCGGAAACAACCGACCCCTTCGAAGCGGTGCGATACGACCGTGCCGGCGATCTCATCATGCGCGCCGCCGCAGCCAAGGCAAGCGAGACGACGGGACGGAAAATCGTGCTGCACCGCAACAACGTCGACGGCAAAGGAGCCAGCTGGGGCACGCACGAAAACTACATGATGCTGCGGTCGGTACCGTTCGACCTGGTGACACGCCTCATGACCACGCACTTCGTCTCCCGACAGATCTTCATCGGCTCCGGCCGCGTCGGCATCGGCGAACATAGCGAAAACGCCGGCTACCAGCTAAGCCAACGTGCCGACTACTTCCATATGAAAGTCGGTTTGCAGACCACATTCGACCGGCCGATCATCAACACCAGAGACGAATCGCACAGCACCGACGAATACCGCAGACTGCACGTGATCGTCGGCGACGCCAACCGCATGGACGTGCCACAGGCCTTGAAACTCGGCACCACCAGCATGCTGCTGTGGCTGCTCGAACACGCCGACATGGCCGAAATCGACTTGAACGAGGCGCTGGAACCGCTTAAGCTCGCCGACCCGGTGGAAGCCATGCACACGGTATCGCACGACCTGACCCTGGCCGCCCCACTACCATTGGAAACAGGTGGAACCACCACCGCATGGCAGATGCAGGTCACTTTGCGTGGACTGGTATATGCGGCCGCGGCAACGGTATACGGCACCGACACATCCGGAGAGCCGGCATGGCCCGACCGCTCCACCCGCAACATCATGGCGATGTGGGGGCAGGCTCTCGCCGACGTCGCCACAGTACGCCATGCCGACGATGACGGACGACTGACGATGCGGGAACAGGCGTCACGTCTCGAATGGCTGCTCAAATGGCAGTTGCTGGAGAAGCTACGCCGCAAGACCGGTTCGGATTGGACCGATCGGCGCCTGGCCGCGGTCGATCTGAAATGGGCCGCGCTCGATCCGGCCGATTCGATTTTCACCCGGCTTGCCGGACAGACCGAACGACTGGTGACGGATAAGCAGCTTGCCGAGGCGGTCGGCCAAGCGCCGGCCGACACACGCGCATGGCTGCGCGCGGAGATCGTACGACGTTTCCCCGAACAGGTCGTTGCGGCGTCGTGGTCCCACCTTACGGTGCGCGGCGAGTCGTCCGGTGATGAAAACGTGGAGAATTCGATGGTCTCGTTGGACATGTCCAATCCGTTGAAATTCACGGAATCCTTGTGTTCCGAAGCGTTCGAGCGTGTCCACACTGCGAGCGGAATCGTGGAATCCCTGCGTTGAATCTC&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&SHOW_OVERVIEW=on&LINK_LOC=blasthome&MAX_NUM_SEQ=100&FORMAT_NUM_ORG=1&CONFIG_DESCR=2%2C3%2C4%2C5%2C6%2C7%2C8&CLIENT=web&SERVICE=plain&CMD=request&PAGE=MegaBlast&MEGABLAST=on&WWW_BLAST_TYPE=newblast&DEFAULT_PROG=megaBlast&SELECTED_PROG_TYPE=megaBlast&SAVED_SEARCH=true&NUM_DIFFS=0&NUM_OPTS_DIFFS=0&USER_DEFAULT_PROG_TYPE=megaBlast' --compressed | grep automatically

If it prints <p>This page will be automatically updated in <b>1</b> seconds until search is done</p> that means it worked, notice the | grep automatically at the end of the curl commands.

meren commented 7 years ago

Thank you very much for solving this, @ozcan. It is my fault for implementing such a hack to get the information to the NCBI servers and get it running, but I wasn't sure what would be a better way to do it :/

I think your solution works.