lucasmation / microdadosBrasil

Reads most common Brazilian public microdata (CENSO, PNAD, etc) easy and fast
165 stars 59 forks source link

create function read_fwf2 #122

Closed lucasmation closed 7 years ago

lucasmation commented 7 years ago

We are running into problems because of two aspects of IBGE stype fwf files that are not supported by readr::read_fwf() :

  1. can't handle overlapping columns (readr#534 and readr#585)
  2. can't handle decimal places (readr#399 and #121)

So I propose we create a new function read_fwf2 that can handle those.

My suggested pseudo code:

read_fwf2( ... , decimal_places=NULL ){
   #Soliving 1
      - separate import dictionary into two, one main dictionary and another with one of the overlaping variables (in theory there could me bore then 2 overlaping variables, so the overlap clean up should be recursive)
      # import and cbind each:
      bind_cols(
                     read_fwf( with first import dictionary)
                     read_fwf( with 2nd import dictionary)
                     )

  #Solving 2
    - subset the import dictionary only for the variables that are numeric and have decimal places
    - divide each of those columns by 10^number_of_decimal_places

}

@iasminilima, @joaofm91, @daniellima123 : please try implementing this. @gutorc92, @nicolassoarespinto : please help them with questions they may have

lucasmation commented 7 years ago

great. But lets remember to switch back to the readr default (which will either incoporate read_fwf2 there or its functionality in read_fwf) if they accept our pull request (readr/issue/637)

lucasmation commented 7 years ago

@gutorc92, while we wait for our pull request to readr (readr/issue/637) to be answered, please add the new reard_fwf to the microdadosBrasil package, so we can close this issue, issue #121 , and so that users get the correct behaviour from our package

please add it to a temp_read_fwf.R file (inside the R folder) so we can remove if the pull request is accepcted