yukiregista / DSC_STUDY

1 stars 3 forks source link

[MS] Features for Ooyama-san's dataiku #42

Open yukiregista opened 9 months ago

yukiregista commented 9 months ago

Please add comments below if you passed your features to Ooyama-san, and write the location of the code.

wjh136134188 commented 9 months ago

df_test = pd.read_csv("test.csv") df_test.dropna() df_test[['DisbursementDate_Day', 'DisbursementDate_Month', 'DisbursementDate_Year']] = df_test['DisbursementDate'].str.split('-', expand=True) df_test[['ApprovalDate_Day', 'ApprovalDate_Month', 'ApprovalDate_Year']] = df_test['ApprovalDate'].str.split('-', expand=True)

df_test.dropna(subset=['DisbursementDate_Year'], inplace=True) df_test['ApprovalDate_Year'] = df_test['ApprovalDate_Year'].astype(int) df_test['ApprovalDate_Year'] = df_test['ApprovalDate_Year'].apply(lambda x: x + 2000 if x <= 14 else x + 1900)

df_test['DisbursementDate_Year'] = df_test['DisbursementDate_Year'].astype(int) df_test['DisbursementDate_Year'] = df_test['DisbursementDate_Year'].apply(lambda x: x + 2000 if x <= 14 else x + 1900)

These are the new features I created, "Difference_Year" and "Difference_SBA_Appv_GrAppv".

df_test['Difference_Year'] = df_test['ApprovalDate_Year'] - df_test['DisbursementDate_Year'] df_test['DisbursementGross'] = df_test['DisbursementGross'].str.replace('[\$,]', '', regex=True).astype(float) df_test['GrAppv'] = df_test['GrAppv'].str.replace('[\$,]', '', regex=True).astype(float) df_test['SBA_Appv'] = df_test['SBA_Appv'].str.replace('[\$,]', '', regex=True).astype(float) df_test['Difference_SBA_Appv_GrAppv'] = (df_test['SBA_Appv']/df_test['GrAppv'])*100

yukiregista commented 9 months ago

The format in dataiku:


# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
train_prepared_stacked = dataiku.Dataset("train_prepared_stacked")
train_prepared_stacked_df = train_prepared_stacked.get_dataframe()

# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

done_df = train_prepared_stacked_df # For this sample code, simply copy input to output

# Write recipe outputs
done = dataiku.Dataset("done")
done.write_with_schema(done_df)