hgrecco / pint-pandas

Pandas support for pint
166 stars 41 forks source link

Parsing CSV with units in the header #166

Open micrenda opened 1 year ago

micrenda commented 1 year ago

This is not really a bug report, but more a suggestion for who may have the same issue as me (however, I would not be angry if a similar solution would be integrated in pint-pandas).

Let suppose you have a CSV like this:

molecule,reduced_field [1e-21 * V * m^2],magboltz_drift_velocity [m / s],magboltz_drift_velocity_precision,bolsig_reduced_mobility [1 / V / m / s],bolsig_drift_velocity [m / s],bolsig_delta,bolsig_validation,betaboltz_drift_velocity [m / s],betaboltz_drift_velocity_stdev [m / s],betaboltz_delta,betaboltz_validation

And you want to read these data using Pandas, you can use this code:

import pandas as pd
import pint
import pint_pandas
import re

def fix_pint_pandas_units(df):
    p = re.compile(r'^(.*)\s*\[(.*)\]\s*$')
    for column in df.columns:
        m = p.match(column)
        if m:
            name = m.group(1).strip()
            unit = m.group(2).strip()
            df.rename(columns={column: name}, inplace=True)
            df[name] = pd.Series(df[name], dtype='pint[' + unit + ']')

if __name__ == '__main__':
    df = pd.read_csv('results.csv')

May be this could be integrated directly in pint-pandas, eventually activated by a flag.