Closed beckyjackson closed 2 years ago
This is working well for CMI-PB endpoint. Two things:
InsecureRequestWarning: Unverified HTTPS request ...
?I also tried this with the IEDB API sprocket https://query-api.iedb.org
. It's slow, but the API itself is slow. Can you please try that and make sure Sprocket is doing what you expect?
Are you seeing a InsecureRequestWarning: Unverified HTTPS request ...?
Yes. I couldn't hit the https
endpoint without including verify=False
on my request. I get the following:
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.cmi-pb.org', port=443): Max retries exceeded with url: /api/v2/specimen?select=specimen_id,subject_id,specimen_type,visit (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
I did try a couple of different fixes for this issue and none of them resolved it... This may only be an issue on my end, in which case we should remove that verify=False
.
Hmm. I'm not sure this is respecting my specified 'limit' for Swagger.
It's not exactly respecting the limit, it just displays the limited results, but I need to know how many total results exist. For small datasets, that's OK, but for large ones it could become an issue (maybe that's why IEDB is being slow, but I wasn't seeing super slow responses). In order for the "next" button to work correctly at the end of a set of data, we need to know how many total results there are. One solution is to do one query when we first load the table (like with the columns) that counts the total results, and stores this in a file like we do with the columns:
SELECT COUNT(*) FROM tablename
Then we already have the total and we don't need to get all results every time.
I'll try implementing this today and then we can see if it speeds up the IEDB results.
Whoops, I just realized we can't do a query like that. I guess we can just get all results once and then cache the total? But then that defeats the purpose of not storing the results themselves. What if we have a .swagger/
directory that stores the JSON?
Postgrest puts range, count, and pagination information in the HTTP response headers: https://postgrest.org/en/v9.0/api.html?highlight=limit#limits-and-pagination
This is exactly what I needed, thanks!
This is weird - it looks like in order to use the offset
query parameter for the IEDB API, you must also include an order
. This is not the case for the CMI-PB API. Without order
, you get an error (only from IEDB):
Query string appears to include an offset parameter without an order parameter. Please resubmit the query with an order parameter to ensure consistent paging. The query was not sent to the API.
It's a little annoying that this isn't consistent between databases. Should we have a default ordering on the first column for all databases, or do you have any other ideas of a workaround for this?
PostgreSQL does not order tables by default. If you provide an offset without an order, you can get unpredictable results. Some IEDB API users complained about this, and so the IEDB API is checking for this case and returning this error message before PostgREST is called.
I'm not sure what we should do about that. Probably nothing, for now: just display the error. We can explicitly specify an order.
I think this is ready for review again @jamesaoverton
The first time you load a table will take a bit longer, but since the totals & columns are cached after that, I think loading subsequent requests will be much faster.
The table is very messy for lots of columns (e.g., antigen_search). I'm not sure what to do about that.
Resolves #11 - this time no database is created, we just hit the endpoint for each request.
Alternatively, you can provide the URL to a PostGRES Swagger endpoint, such as https://www.cmi-pb.org/api/v2. Each request will be sent to the endpoint and the JSON results will be displayed as the same HTML table as providing a database.