lucasmation / microdadosBrasil

Reads most common Brazilian public microdata (CENSO, PNAD, etc) easy and fast
165 stars 59 forks source link

Fix no_dic_overlap() function #154

Open nicolassoarespinto opened 6 years ago

nicolassoarespinto commented 6 years ago

Should not split positions that are not continuous but not overlapping either:


OLD:

10 - 11          10 - 11            15 - 16         15 -18
11 - 12          11 - 12     +                  +
15 - 16    => 
15 - 18

NEW:

10 - 11        10 - 11        15 - 18
11 - 12        11 - 12   +    
15 - 16    =>  15 - 16
15 - 18
lucasmation commented 6 years ago

why does old or new make any difference?

Please check if after importing the variables follow the order in which they appear in the dicionary. I thik if there is an overlap the overlapping variables would get moved to a 2nd import round and then merged into the end of the file, right?

If that is not too complicated, we should reorder the variables follow the original dic.

abs Lucas

2017-12-07 15:30 GMT-02:00 nicolassoarespinto notifications@github.com:

Should not split positions that are not continuous but not overlapping either:

OLD:

10 - 11 10 - 11 15 - 16 15 -18 11 - 12 11 - 12 + + 15 - 16 => 15 - 18

NEW:

10 - 11 10 - 11 15 - 18 11 - 12 11 - 12 + 15 - 16 => 15 - 16 15 - 18

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lucasmation/microdadosBrasil/issues/154, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXDiHTyu4ooQhPl0haD9jwfq3qs9Lopks5s-CCYgaJpZM4Q56fj .

nicolassoarespinto commented 6 years ago

@lucasmation The end result is the same. For one particular file the dictionary is splitted in hundreds of dictionaries because of one big discontinuity, and calling read_fwf hundreds of times its time consuming. It is actually very simple to fix it, I already did and will push soon, only opened the Issue because it is a change that I would like to document in case that any thing goes wrong.

nicolassoarespinto commented 6 years ago

In the current implementation the variables are not reordered, will work on that.