r-lib / httr

httr: a friendly http package for R
https://httr.r-lib.org
Other
986 stars 1.99k forks source link

POST request containing dollar sign blocked when content_type_json() is applied #725

Closed AmyMikhail closed 10 months ago

AmyMikhail commented 1 year ago

I'm trying to send a json-formatted query containing the $in operator as a POST request to mongodb databases held on a server.

I think this problem has two elements:

  1. the web application firewall for the servers that some of my mongodb databases are on doesn't like the $in operator
  2. httr::content_type_json() is doing something strange to the formatting of the request when it contains a $in operator, which is what the firewall is complaining about?

I'm already in touch with my server network support to try and resolve the first issue, but I am wondering if the second issue is a bug? I'm struggling to find out what content_type_json() actually does, because it seems the function details are not exported.

Below is an example JSON query and the parameters I was using to send it with httr::POST(). I can't make a fully replicable example as my mongodb database requires user credentials and is confidential, but I think it might be possible to replicate the problem by sending a request with a query containing the $in operator using the same httr::POST() parameters as below to any server-held database that accepts JSON queries (unless it is realy just the firewall that is causing the problem, but I think it is a combination of the firewall and something that content_type_json() does to the query that makes it incompatible).

# Load libraries:
library(httr)
library(jsonlite)
library(dplyr)
library(purrr)

# Set user credentials for mongodb database:
url <- url
username <- username
password <- password
outbreak_id <- outbreak_id

# Example JSON query (in pretty format for easier inspection):
> query_json_l
{
  "filter": {
    "where": {
      "dateSampleTaken": {
        "$in": ["2022-07-28T00:00:00.000Z", "2022-07-29T00:00:00.000Z", "2022-07-30T00:00:00.000Z"]
      },
      "useDbColumns": "true",
      "dontTranslateValues": "true",
      "jsonReplaceUndefinedWithNull": "true"
    },
    "fields": ["id", "updatedAt", "person.visualId", "labName", "sampleIdentifier", "dateSampleTaken", "sequence.hasSequence"]
  }
} 

# Send query request to mongodb database and get exportLogId (elid):
elid <- httr::POST(url =
                    paste0(url,
                           "api/outbreaks/",
                           outbreak_id,
                           "/lab-results/export?access_token=",
                           get_access_token(url = url,
                                            username = username,
                                            password = password)),

                  httr::content_type_json(),
                  body = query_json_l,
                  encode = "raw") %>%

  # Fetch content:
  httr::content() %>%

  # Extract export log ID from content:
  purrr::pluck("exportLogId")

# Use exportLogId to fetch records matching filter query from mongodb database:
results <- httr::GET(url =
                         paste0(url,
                                "api/export-logs/",
                                elid,
                                "/download?access_token=",
                                get_access_token(url = url,
                                                 username = username,
                                                 password = password))) %>%

  # Fetch content of downloaded file:
  httr::content("text", encoding = "UTF-8") %>%

  # Convert json to flat data.frame:
  jsonlite::fromJSON(flatten = TRUE)

When the first query request is blocked, elid = NULL.
When the first query returns a valid elid, it is a character string that can be passed to the second query. When the second query works as expected, results should only contain records that match the query conditions.

hadley commented 10 months ago

httr has been superseded in favour of httr2, so is no longer under active development. But it sounds like your problem is best diagnosed by using with_verbose() so that you can see exactly what is being sent to the server, and how it's different to a sample request that works.