petl-developers / petl

Python Extract Transform and Load Tables of Data
MIT License
1.22k stars 190 forks source link

Why iterrowmapmany convert each row to Record instance ? #623

Open jossefaz opened 2 years ago

jossefaz commented 2 years ago

In this method :

https://github.com/petl-developers/petl/blob/0be2735ff21491c0f4f88bf8dc5336a9d71cb884/petl/transform/maps.py#L309

Each row is converted to a Record instance.

https://github.com/petl-developers/petl/blob/0be2735ff21491c0f4f88bf8dc5336a9d71cb884/petl/transform/maps.py#L314

In my usecase, my "rowgenerator" helper function function do need a named tuple and not a plain row as an input. This is a great convenience to call named attribute instead of unclear row[5] - "index notation". For that purpose I tried to use rowmapmany in this way :

etl.rowmapmany(etl.namedtuples(my_table), rowgenerator=mapper, header=headers)

I thought that using namedtuples will solve my issue (because my row has more than 100 columns, so it is a bit hard to use indexes i.e row[57] where a named tuple could simply gives me the convenience of row.my_target_attribute.

But because of this conversion to Record instance, the input will convert each namedtuple to a plain list of values which is a bit frustrating, since it forces us to use the indexes notation in the mapper function (very hard to read).

When I remove this line https://github.com/petl-developers/petl/blob/0be2735ff21491c0f4f88bf8dc5336a9d71cb884/petl/transform/maps.py#L314

It works like a charm.... Why this Record conversion is important ? If it is not, could we remove it from the iterrowmapmany method ?

Please help 🙏

jossefaz commented 2 years ago

Another reason to not convert to a Record : using nameduple as input for the rowmapper, unleash us from any order binding... accessing property in the mapper will be by name and not by position.

So no matter what are the order of the field in the input source, the mapper will work as expected, even if the field order changed between two input that have the same output target.

bmaggard commented 2 years ago

https://petl.readthedocs.io/en/latest/util.html#petl.util.base.records

"a record is a hybrid object supporting all possible ways of accessing values."

The examples for rowmapmany demonstrate this:

https://petl.readthedocs.io/en/latest/transform.html#petl.transform.maps.rowmapmany `

def rowgenerator(row): ... transmf = {'male': 'M', 'female': 'F'} ... yield [row[0], 'gender', ... transmf[row['sex']] if row['sex'] in transmf else None] ... yield [row[0], 'age_months', row.age * 12] ... yield [row[0], 'bmi', row.height / row.weight ** 2] ... table2 = etl.rowmapmany(table1, rowgenerator, ... header=['subject_id', 'variable', 'value'])

`