microsoft / AzureSMR

AzureSMR is no longer being actively developed. For ongoing support of Azure in R, see: https://github.com/Azure/AzureR
Other
60 stars 43 forks source link

Read .gz file from Data Lake #115

Open MartheUT opened 6 years ago

MartheUT commented 6 years ago

There is a need to read .gz files from the data lake. Adding gunzip to the azureDataLakeRead function will not work because you can't unzip a response only a file.

MartheUT commented 6 years ago

Probably not the most elegant solution, but it works:


azureDataLakeReadCSVGZ<- function (azureActiveContext, azureDataLakeAccount, relativePath, 
                                  offset, length, bufferSize, verbose = FALSE) 
{
  resHttp <- azureDataLakeReadCore(azureActiveContext, azureDataLakeAccount, 
                                   relativePath, seperator, offset, length, bufferSize, verbose)
  stopWithAzureError(resHttp)
  resRaw <- (content(resHttp, as="raw", type="gz", encoding = "UTF-8"))

  #Write a temporary file in binary mode from where you can unzip the data
  TempName<-tempfile(pattern = "", fileext = ".csv.gz")
  con <- file(TempName, "wb") 
  writeBin(resRaw, con)
  close(con)
  Data<-read.table(TempName, sep=seperator)
  return(Data)
}