moldach commented 3 years ago

Hoping you can help me try to troubleshoot an error I'm running into.

When I make a standard curl request for the API from the command line I see that there are many pages for TNF-alpha: curl -X GET "https://api.targetsafety.info/api/target/alerts/param?uniprotid=P01375&page=1&token=[MyPrivateKey]" > tnf_alpha.json

We see that the created .json file shows that there are 17 pages (and that this request is only showing page 1/17):

{
  "page": 1,
  "numberPages": 17,
  "targets": [
    {
      "target_id": 348,
      "target_name": "TNF-alpha",
      "actions": [
        {
          "action_id": 15,
          "action_name": "Inhibitors",
          "alerts": [
            {
              "affected_system_id": 10005329,
              "affected_system": "blood and lymphatic system disorders",
              "adverse_event_id": 10043554,
              "adverse_event": "thrombocytopenia",
              "ref_id": 95580,
...

Since I'm making 100s thousands of API calls with AsyncQueue() each of them will have different number of pages.

How is it possible to crawl each of these pages using AsyncQueue()?

Currently only the 1st of x pages are being shown (_note: dropping &page=1 from the url results in a broken API call - the exact page number must be specified).

However, currently I see the following note in the documents about Paginator:

Paginator is a class for automating pagination. It requires an instance of HttpClient as it’s first parameter. It does not handle asynchronous requests at this time, but may in the future. Paginator may be the right class to use when you don’t know the total number of results. Beware however, that if there are A LOT of results (and a lot depends on your internet speed and the server response time) the requests may take a long time to finish - just plan wisely to fit your needs.

If this isn't supported yet it would be greatly welcomed in the near future 😁

sckott commented 3 years ago

Thanks for the bump, will try to get this done

sckott commented 3 years ago

I think this would only work for async when we can construct URLs ahead of time because the whole point of async requests is to send off a bunch of requests at the same time. Thus, it can't work if you have to do request A to get the information to do request B. Should work if e..g, you know you want 1000 results, and you know the pagination query param names

moldach commented 3 years ago

Yeah it looks like it would need to be a two-step process then since we don't know the range of values to give for page=<n> where n is unknown:

Alerts Information Uniprot - Get target alerts by uniprotid

GET

https://api.targetsafety.info/api/target/alerts/param?

Parameter

Success 200

Success Response

{
    "page": 1,
    "numberPages": 11,
    "targets": [
        {
            "target_id": 158,
            "target_name": "SGLT2",
            "actions": [
                {
                    "action_id": 2,
                    "action_name": "Activators",
                    "alerts": [
                        {
                            "affected_system_id": 10015919,
                            "affected_system": "eye disorders",
                            "adverse_event_id": 10015916,
                            "adverse_event": "eye disorder",
                            "ref_id": 62872,
                            "ref_source_type": "Journal",
                            "ref_title": "Leveraging Human Genetics to Identify Safety Signals Prior to Drug Marketing Approval and Clinical Use",
                            "ref_citation": "Drug Saf 2020 Feb 28",
                            "ref_pubmed_id": "32112228",
                            "ref_link": null,
                            "ref_date": "2020-02-28",
                            "alert_detail_id": 662867,
                            "alert_title": "Phenome-wide association study identifying human gene mutations that could be used for in silico prediction of potential adverse drug effects. Results revealed 8 positive associations correlating gene mutation phenotypes with known safety signals from drugs targeting the protein. These associations were PCSK9 (spina bifida), TNF-alpha (cellulitis and leg abscess), PPARgamma (obesity), estrogen receptor-alpha (hemorrhages), ACE (congenital urinary anomalies), phospholipase A2 (primary hypercoagulable state), GluN2B (symbolic dysfunction) and GluN2A (paroxysmal tachycardia, pulmonary heart disease and sleep disorders). Other safety issues are listed.",
                            "alert_date": "2020-03-11",
                            "alert_genetic_study_variant": "gain-of-function mutation",
                            "alert_type": "Class Alert",
                            "alert_phase": "Target Discovery",
                            "alert_onoff_target": "On-Target",
                            "alert_level_evidence": "Suspected",
                            "alert_severity": "no",
                            "alert_species": "human",
                            "drugs": []
                        }
                    ]
                }
            ]
        }
    ]
}

Only upon a successful API call (Success 200) would we get n from numberPages. So with a bit more effort we could grep numberPages from each successful API call and then construct these URLs ahead of time.

Closing this issue since I asked them to provide us with a bulk data download instead... 💁🏼

ropensci / crul

Add support for pagination for Async calls #160

Alerts Information Uniprot - Get target alerts by uniprotid

Parameter

Success 200