Closed bluefir closed 11 years ago
You don't need to mask the rhs of the expression, it will align directly
In [30]: df = DataFrame(dict(country = ['A','B',np.nan,'D'],issuer=['Ai','Bi','Ci','D']))
In [31]: df
Out[31]:
country issuer
0 A Ai
1 B Bi
2 NaN Ci
3 D D
In [32]: df.loc[df['country'].isnull(),'country'] = df['issuer']
In [33]: df
Out[33]:
country issuer
0 A Ai
1 B Bi
2 Ci Ci
3 D D
Thanks!
Look at what I get
In [12]:
column_country = TopContributors.FIELD_NAME_COUNTRY
column_issuer_country = TopContributors.FIELD_NAME_ISSUER_COUNTRY
column_msci_country = TopContributors.FIELD_NAME_MSCI_COUNTRY
In [13]:
country_columns = [column_issuer_country, column_msci_country, column_country]
In [14]:
portfolio_data.loc[portfolio_data['issuer_country'].isnull(), country_columns]
Out[14]:
issuer_country msci_country country
25386810 NaN US NaN
31374720 NaN US NaN
42217K10 NaN US NaN
49446R10 NaN US NaN
53117210 NaN US NaN
620088 NaN AU NaN
620884 NaN ZA NaN
624730 NaN AU NaN
72925110 NaN US NaN
74340W10 NaN US NaN
74460D10 NaN US NaN
75884910 NaN US NaN
90265310 NaN US NaN
92904210 NaN US NaN
B0RF60 NaN AU NaN
B1G8KX NaN BR NaN
B27WYK NaN BR NaN
B4TXDZ NaN HK NaN
B70DWB NaN AU NaN
In [15]:
portfolio_data.loc[portfolio_data[column_country].isnull(), column_country] = portfolio_data[column_msci_country]
In [16]:
portfolio_data.loc[portfolio_data[column_issuer_country].isnull(), country_columns]
Out[16]:
issuer_country msci_country country
25386810 NaN US US
31374720 NaN US US
42217K10 NaN US US
49446R10 NaN US US
53117210 NaN US US
620088 NaN AU US
620884 NaN ZA US
624730 NaN AU US
72925110 NaN US US
74340W10 NaN US US
74460D10 NaN US US
75884910 NaN US US
90265310 NaN US US
92904210 NaN US US
B0RF60 NaN AU US
B1G8KX NaN BR US
B27WYK NaN BR US
B4TXDZ NaN HK US
B70DWB NaN AU US
This works
In [15]:
#portfolio_data.loc[portfolio_data[column_country].isnull(), column_country] = portfolio_data[column_msci_country]
countries = portfolio_data[column_country]
flag_no_country = countries.isnull()
countries[flag_no_country] = portfolio_data.loc[flag_no_country, column_msci_country]
portfolio_data[column_country] = countries
In [16]:
portfolio_data.loc[portfolio_data[column_issuer_country].isnull(), country_columns]
Out[16]:
issuer_country msci_country country
25386810 NaN US US
31374720 NaN US US
42217K10 NaN US US
49446R10 NaN US US
53117210 NaN US US
620088 NaN AU AU
620884 NaN ZA ZA
624730 NaN AU AU
72925110 NaN US US
74340W10 NaN US US
74460D10 NaN US US
75884910 NaN US US
90265310 NaN US US
92904210 NaN US US
B0RF60 NaN AU AU
B1G8KX NaN BR BR
B27WYK NaN BR BR
B4TXDZ NaN HK HK
B70DWB NaN AU AU
I have HDF5 file with the data if you are interested. Can I attach it to this thread?
put it on a public site (like dropbox), and put the shared link here
so you are trying to fill in 'msci_country' with country if its NaN, right?
Does this not do what you want?
In [13]: df
Out[13]:
<class 'pandas.core.frame.DataFrame'>
Index: 15504 entries, 000312 to Y8565N10
Data columns (total 11 columns):
MarketCap 15503 non-null values
alpha 15482 non-null values
gics_code 15503 non-null values
investable 15504 non-null values
issuer_country 15485 non-null values
msci_country 11019 non-null values
universe 15504 non-null values
weight_benchmark 15504 non-null values
weight_portfolio 15504 non-null values
weight_active 15504 non-null values
country 15485 non-null values
dtypes: bool(2), float64(6), object(3)
In [14]: df.loc[df['msci_country'].isnull(),'msci_country'] = df['country']
In [15]: df
Out[15]:
<class 'pandas.core.frame.DataFrame'>
Index: 15504 entries, 000312 to Y8565N10
Data columns (total 11 columns):
MarketCap 15503 non-null values
alpha 15482 non-null values
gics_code 15503 non-null values
investable 15504 non-null values
issuer_country 15485 non-null values
msci_country 15504 non-null values
universe 15504 non-null values
weight_benchmark 15504 non-null values
weight_portfolio 15504 non-null values
weight_active 15504 non-null values
country 15485 non-null values
dtypes: bool(2), float64(6), object(3)
In [16]: df.loc['Y8565N10']
Out[16]:
MarketCap 234.05
alpha -3.799255
gics_code 1.010204e+07
investable True
issuer_country US
msci_country US
universe True
weight_benchmark 0
weight_portfolio 0
weight_active 0
country US
Name: Y8565N10, dtype: object
I am trying to do the opposite:
df.loc[df['country'].isnull(),'country'] = df['msci_country']
seems ok to me are you on 0.11? (there was a bug related to this), it prob assigns to all the same value
try using 0.12
In [6]: df.loc[df['country'].isnull(),'country'] = df['msci_country']
In [7]: df
Out[7]:
<class 'pandas.core.frame.DataFrame'>
Index: 15504 entries, 000312 to Y8565N10
Data columns (total 11 columns):
MarketCap 15503 non-null values
alpha 15482 non-null values
gics_code 15503 non-null values
investable 15504 non-null values
issuer_country 15485 non-null values
msci_country 11019 non-null values
universe 15504 non-null values
weight_benchmark 15504 non-null values
weight_portfolio 15504 non-null values
weight_active 15504 non-null values
country 15504 non-null values
dtypes: bool(2), float64(6), object(3)
Yes, I was on 0.11. It is indeed fixed with 0.12. I switched but went back to 0.11 because of a regression in 0.12 for DataFrame().hist() display. Switching to 0.12 now! Thanks!
fyi....I believe the .hist is just fixed in latest master IIRC, (and windows builds should be on dev site soon)
yep, it was fixed. waiting for the new build...
Is there a more concise way of accomplishing the following?