timriffe / TR1

Read Human Mortality Database and Human Fertility Database Data from the Web
10 stars 3 forks source link

Vectorized data requests #12

Open timriffe opened 5 years ago

timriffe commented 5 years ago

Users often request the ability to specify a vector of countries, a vector of items, or a range of ages or years. It certainly would be faster to get multiple files at once, because only one login would be required. This sort of thing would just need to be modular to not further mud up the web functions.

ottlngr commented 5 years ago

Without touching existent functions, a wrapper around readHMDweb() e.g. would do the trick, returning a list of data.frames:

hmd_get_country_item_combinations <- function(countries, items, username, password) {

  collection = list()
  j <- 1

  for (c in countries) {

    print(c)

    for (i in items) {

      print(i)

      name <- paste(c, i, sep = "_")
      collection[[j]] <- readHMDweb(CNTRY = c, item = i, username = username, password = password)
      names(collection)[[j]] <- name
      j <- j + 1

    }

  }

  return(collection)

}

collection <- hmd_get_country_item_combinations(countries = c("AUS", "DNK"), items = c("Deaths_1x1", "Mx_1x1"), username = username, password = password)

Is that an option?

timriffe commented 5 years ago

Then I’m not sure the login would be recycled. I’ve assumed it would go through login redundantly in each rep if the innermost loop, which we’d want to avoid. But I haven’t verified that, just seems intuitive. Best if kept in same function I think, on the inside, and optional, such that we have the same default behavior. Make sense?

On Wed, Jun 26, 2019 at 8:05 PM Philipp Ottolinger notifications@github.com wrote:

Without touching existent functions, a wrapper around readHMDweb() e.g. would do the trick, returning a list of data.frames:

hmd_get_country_item_combinations <- function(countries, items, username, password) {

collection = list() j <- 1

for (c in countries) {

print(c)

for (i in items) {

  print(i)

  name <- paste(c, i, sep = "_")
  collection[[j]] <- readHMDweb(CNTRY = c, item = i, username = username, password = password)
  names(collection)[[j]] <- name
  j <- j + 1

}

}

return(collection)

}

collection <- hmd_get_country_item_combinations(countries = c("AUS", "DNK"), items = c("Deaths_1x1", "Mx_1x1"), username = username, password = password)

Is that an option?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/timriffe/TR1/issues/12?email_source=notifications&email_token=AAG43GYABEGZAEAV3KVOO43P4OVVBA5CNFSM4H3PCVF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYULI7A#issuecomment-505984124, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG43GZ33GVILTXSGJKJQZDP4OVVBANCNFSM4H3PCVFQ .


This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.

ottlngr commented 5 years ago

Not to me, to be honest.

Though the login seems to be recycled when using a browser to download files from HMD, it is actually not. The browser just stores your credentials and sends them (base64 encoded) as a request header - can be verified using the developer tools of the browser. So yes, the innermost loop does a login each time, but that's exactly what your browser does when downloading several files during one session.

And: That's probably a matter of taste, but I prefer to have several small functions with a distinct domain rather than one big function that handles every step (looking for available data, downloading, tidying data, ...) - maintenance becomes much more easier this way, usually.

But of course a similar effect can be achieved inside readHMDweb() by checking if CNTRY and item are arrays, and, if so, iterating over all combinations. That would just put the logic of the wrapper function inside the existent function.

timriffe commented 5 years ago

Thanks for the authentication explanation. I agree re small and modular, and the cheap solution I imagime is to make a lighter version of readHMDweb(), readHMDweb_lite() that assumes all checks done and various concatenations made, and to use this in the innermost loop. Certainly the present one could be more modular than it is too. The wrapper could carry the same name and have the same default behavior, but be tidier in this way.

On Wed, Jun 26, 2019 at 10:37 PM Philipp Ottolinger < notifications@github.com> wrote:

Not to me, to be honest.

Though the login seems to be recycled when using a browser to download files from HMD, it is actually not. The browser just stores your credentials and sends them (base64 encoded) as a request header - can be verified using the developer tools of the browser. So yes, the innermost loop does a login each time, but that's exactly what your browser does when downloading several files during one session.

And: That's probably a matter of taste, but I prefer to have several small functions with a distinct domain rather than one big function that handles every step (looking for available data, downloading, tidying data, ...) - maintenance becomes much more easier this way, usually.

But of course a similar effect can be achieved inside readHMDweb() by checking if CNTRY and item are arrays, and, if so, iterating over all combinations. That would just put the logic of the wrapper function inside the existent function.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/timriffe/TR1/issues/12?email_source=notifications&email_token=AAG43G2LATZZWJXMSWOJ2G3P4PHPZA5CNFSM4H3PCVF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUXYKI#issuecomment-506035241, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG43G7QB23ZVXBJFZ6WZITP4PHPZANCNFSM4H3PCVFQ .


This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.