roo-rb / roo

Roo provides an interface to spreadsheets of several sorts.
MIT License
2.78k stars 502 forks source link

Parsing on nested columns #495

Open guilhermemaranhao opened 5 years ago

guilhermemaranhao commented 5 years ago

Hi everyone,

I'm very new to Roo and I'm facing an issue related to parsing on nested columns. I'd like to use the parse()method, but I have columns with duplicated names. So, I need to iterate over the head columns, but I don't know how to do it, as when I set my header_search option, some of the below columns aren't being returned.

Here is an example of my sheet: my sheet

I've tried the following approaches: For a matter of space, I didn't put all the columns. document.sheet(0).parse(clean: true, ibge: /IBGE|ibge/, municipio: /Município|Municipio/, uf: /UF|uf/, liraa: /LIRAa|liraa/, lia: /LIA|lia/, armadilhas: /ARMADILHA|armadilha/, tipo_armadilhas: /Tipo de Armadilhas|TIPO DE ARMADILHAS/, iip: /IIP|iip/, ib: 'IB', estratos_1: /estratos/, estratos_2: /estratos/ )

The points here are:

  1. I have many 'Nº estratos' columns. How can I reach them separately?

  2. Also, I've got a headerRowNotFoundError because of Tipo de Armadilhas column. Why?

So, I've tried another approach: rows = document.sheet(0).parse(clean: true, header_search: [/Dados Gerais/])

Only some of the below columns are returned: rows[0] returned

nil => nil
Dados Gerais => nil
Tipo de Armadilhas => nil
Índices => Aedes albopictus
Classificação dos índices segundo IIP => IIP <= 0,9
Criadouros => A1

Why weren't the other columns returned?

Does anybody know how can I get all my columns usingparse(). Is it the best solution for that?

Thank you very much,

Guilherme

chopraanmol1 commented 5 years ago

@guilhermemaranhao currently roo doesn't support header recognition for the nested column.

Although I've not put much thought, I think it will be good to support something like following (or some other variant)

document.sheet(0).parse(clean: true, ibge: /ibge/i, tipo_armadilhas: [/Tipo de Armadilhas/i, //], estratos_1: [/<= 0,9/, /estratos/], estratos_2: [ /1 <=.*<= 3,9/, /estratos/] )

The above variant should be highly useful with expand_merged_ranges option.

Feel free to open PR for this.

guilhermemaranhao commented 5 years ago

Thank you, @chopraanmol1 !

I decide to use document.sheet(0).row(0) etc, until the last row and get the columns one by one. I suppose it does not have a .rows()method, doesn't it? Something that gives me all the sheet rows.

guilhermemaranhao commented 5 years ago

Unfortunately I won't have time to solve it with a PR.