hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 41 forks source link

Parsing CSV with units in the header #166

Open micrenda opened 1 year ago

micrenda commented 1 year ago

This is not really a bug report, but more a suggestion for who may have the same issue as me (however, I would not be angry if a similar solution would be integrated in pint-pandas).

Let suppose you have a CSV like this:

molecule,reduced_field [1e-21 * V * m^2],magboltz_drift_velocity [m / s],magboltz_drift_velocity_precision,bolsig_reduced_mobility [1 / V / m / s],bolsig_drift_velocity [m / s],bolsig_delta,bolsig_validation,betaboltz_drift_velocity [m / s],betaboltz_drift_velocity_stdev [m / s],betaboltz_delta,betaboltz_validation
Ar,0.1,1702.9999999999998,0.45999999999999996,1.647e+25,1647.0,-3.2883147386964144,OK,1806.69,623.853,6.0886670581327165,OK
Ar,0.12589254117941673,1811.9999999999998,0.42,1.394e+25,1755.046,-3.143156732891819,OK,1891.57,504.651,4.3912803532008935,OK
Ar,0.15848931924611134,1932.9999999999998,0.62,1.179e+25,1868.715,-3.32565959648215,OK,2209.5899999999997,619.583,14.30884635281946,NOK
Ar,0.19952623149688797,2047.9999999999998,0.43,9.957e+24,1986.4215000000002,-3.006762695312487,OK,2263.21,573.505,10.50830078125002,NO

And you want to read these data using Pandas, you can use this code:

import pandas as pd
import pint
import pint_pandas
import re

def fix_pint_pandas_units(df):
    p = re.compile(r'^(.*)\s*\[(.*)\]\s*$')
    for column in df.columns:
        m = p.match(column)
        if m:
            name = m.group(1).strip()
            unit = m.group(2).strip()
            df.rename(columns={column: name}, inplace=True)
            df[name] = pd.Series(df[name], dtype='pint[' + unit + ']')

if __name__ == '__main__':
    df = pd.read_csv('results.csv')
    fix_pint_pandas_units(df)
    print(df.dtypes)

May be this could be integrated directly in pint-pandas, eventually activated by a flag.