monarch-initiative / biolink-api

API for linked biological knowledge
https://api.monarchinitiative.org/api/
BSD 3-Clause "New" or "Revised" License
63 stars 25 forks source link

Consider more proactive sanitizing of inputs and output in API #371

Open kltm opened 3 years ago

kltm commented 3 years ago

From https://github.com/geneontology/biolink-api/issues/11, it looks like some bot was doing some gentle probing of the API using SQL injection to try and find a way in. It naturally didn't do much as there is no SQL backend, etc. (at this point anyways), but it does highlight something that can become problematic. As well, it triggered a firewall bot to (correctly IMO) block the source, which unfortunately broke an application.

I would suggest that it might be good to consider, at least as a thought moving forward, a more proactive stance to prevent things like this from propagating through the API. Basic input sanity checks might help prevent future issues from cropping up (from one of the libraries that do this) .

From LBL security commentary at https://github.com/geneontology/biolink-api/issues/11:

http://131.243.192.30/solr/select?q=*:*&qf=&fq=document_category:\"ontology_class\"&fl=annotation_class,annotation_class_label,description,source&wt=json&indent=on&fq=subset:goslim_agr and 1=1&rows=1000

1=1 is likely what triggered the sql injection attack! 

More abstractly, from a Solr perspective, depending on setup, problems with the underlying server could still be theoretically exposed if a pass-through exists; e.g. http://mail-archives.apache.org/mod_mbox/www-announce/201902.mbox/%3CCAECwjAVjBN%3DwO5rYs6ktAX-5%3D-f5JDFwbbTSM2TTjEbGO5jKKA%40mail.gmail.com%3E https://issues.apache.org/jira/browse/SOLR-12770

Something to bear in mind anyways, in case any other deployment site has similar issues.