Hey! Need some help with Optimize script

cstampar commented 4 years ago

Hey there! First thanks so much for putting this together, I have been looking everywhere for a package to do this type of analysis.

I have been trying to implement your "Portfolio Optimize-Smaller.ipynb" but a few things aren't working.

For one, the yf.Download does work, but creates a multilayer column index with each Asset above its OHLC data, which some lines later on seems to get confused by.

I got around this by individually downloading closing data for 5 assets im interested in, then combined those in a DF and calculated daily returns. That works and got me towards the end of your script when i hit the section of printing the portfolio allocations, which is where it broke.

My DF is storing each column of closing data with a column titled for that asset such as "AAPL", "GS", etc. But the print_allocation function has no way of looping through each of these named columns to get latest closing price, and seems to get confused here and break as its calling "data" which i didnt create in the same way for reasons above. I tried to write my own for loop to go through each ticker and call the print_allocation function but can't seem to get that to work either. I believe the key issue is that you are passing "data['close'].iloc[-1] which I can't do because it won't find "close" in multiindex columns.

Do you have any guidance on how I can get this to work? It's really stumping me. Thanks so much!

cstampar commented 4 years ago

I tried to create a for loop that looks like this: for ticker in tickers:
print_allocation(train, solution, data[ticker, 'Close'].iloc[-1]) my idea was to access closing price for each ticker in my list and pass it to the multiindex DF but i get this error: "invalid index to scalar variable" which has come up repeatedly for the print line in "def print_allocation"

cstampar commented 4 years ago

Sorry for the flood of info but I thought this might be helpful in case any one else encounters similar problems. I figured out some of my problems above, though i'll leave the text above in case some one also needs a work around for the data variable problem.

I was able to fix the problem in "print_allocation" by passing the variables "print_allocation(train, solution, data[ticker, 'Close'].iloc[-1]) which does now print all the share info as necessary. This also requires changing the last division term in "def print_allocation" from "prices[data.columns[ticker_id]" to just "prices". But i'm not sure if this is contributing to the issue below. It doesn't seem to work unless i change this though.

But here are is the next problem. Here are the results when I got it working.

AAPL - 100.0000, $42928.00 USD, 1963.77 shares BRK-B - 0.0000, $0.00 USD, 0.00 shares GS - 0.0000, $0.00 USD, 0.00 shares JPM - 0.0000, $0.00 USD, 0.00 shares RVLV - 0.0000, $0.00 USD, 0.00 shares

Why would it allocate everything to one asset? I also noticed when the solution was generated, instead of having multiple orange solution markers across the frontier as in your image, my graph only shows 1 identified solution.

Is this a problem with my asset choices? or something about how the algo is assessing them?

Any insights would be much appreciated! :)

cstampar commented 4 years ago

Here is the full code so you can see what is producing this result in comment above, I believe I have implemented everything exactly as possible per the script with a few modifications: i took out plotting for brevity

PERIOD = '10y' GAtickers = tickers data = yf.download(GAtickers, period=PERIOD, interval="1d", auto_adjust=True, prepost=False, group_by='ticker') data = pd.DataFrame(data)

apple = yf.Ticker("AAPL").history(period=PERIOD)['Close'] appleDF = pd.DataFrame(apple) appleDF['dailyreturn'] = (appleDF['Close'] / appleDF['Close'].shift(1)-1)

brkb = yf.Ticker("BRK-B").history(period=PERIOD)['Close'] brkbDF = pd.DataFrame(brkb) brkbDF['dailyreturn'] = (brkbDF['Close'] / brkbDF['Close'].shift(1)-1)

gs = yf.Ticker("GS").history(period=PERIOD)['Close'] gsDF = pd.DataFrame(gs) gsDF['dailyreturn'] = (gsDF['Close'] / gsDF['Close'].shift(1)-1)

jpm = yf.Ticker("JPM").history(period=PERIOD)['Close'] jpmDF = pd.DataFrame(jpm) jpmDF['dailyreturn'] = (jpmDF['Close'] / jpmDF['Close'].shift(1)-1)

rvlv = yf.Ticker("RVLV").history(period=PERIOD)['Close'] rvlvDF = pd.DataFrame(rvlv) rvlvDF['dailyreturn'] = (rvlvDF['Close'] / rvlvDF['Close'].shift(1)-1)

dataDF = pd.DataFrame(index=aaple.index, columns=["AAPL","BRK-B","GS","JPM","RVLV"])

dataDF["AAPL"] = appleDF['dailyreturn'] dataDF["BRK-B"] = brkbDF['dailyreturn'] dataDF["GS"] = gsDF['dailyreturn'] dataDF["JPM"] = jpmDF['dailyreturn'] dataDF["RVLV"] = rvlvDF['dailyreturn']

dataDFprices = pd.DataFrame(index=aaple.index, columns=["AAPL","BRK-B","GS","JPM","RVLV"]) dataDFprices["AAPL"] = appleDF['Close'] dataDFprices["BRK-B"] = brkbDF['Close'] dataDFprices["GS"] = gsDF['Close'] dataDFprices["JPM"] = jpmDF['Close'] dataDFprices["RVLV"] = rvlvDF['Close']

YEAR_BARS = 252 TEST_YEARS = 0 TRAIN_END_DATE = dataDF.index.max() - timedelta(days=TEST_YEARS*365)

train = dataDF[(dataDF.index < TRAIN_END_DATE)].fillna(0) test = dataDF[(dataDF.index >= TRAIN_END_DATE)].fillna(0)

def random_population(n_assets, population_size): weights = np.random.uniform(0, 1, size=(population_size, n_assets)) return weights / weights.sum(axis=-1).reshape((-1, 1))

def annualized_portfolio_return(returns, weights): weighted_returns = np.matmul(weights, np.mean(returns.values, 0)) return (weighted_returns + 1) ** YEAR_BARS - 1

def annualized_portfolio_volatility(returns, weights): variance = np.sum(weights np.matmul(weights, np.cov(returns.T.values)), -1) return np.sqrt(variance) np.sqrt(YEAR_BARS)

def annualized_portfolio_performance(returns, weights): return np.stack([ annualized_portfolio_return(returns, weights), annualized_portfolio_volatility(returns, weights) ], -1)

random_weights = random_population(train.shape[1], 100000) random_solutions = annualized_portfolio_performance(train, random_weights)

GAoptimizerTime = datetime.now() optimizer = Optimizer(mutation_sigma=1, verbose=False, max_iter=250, population_size=5000) solutions, stats = optimizer.run(train.values) print(f"This took {datetime.now()-GAoptimizerTime}")

ov = annualized_portfolio_performance(train, solutions) sharpe = ov[:,0] / ov[:,1] solution = solutions[np.argmax(ov[:,0] / ov[:,1])]

def print_allocation(data, allocations, prices): for ticker_id in np.argsort(-allocations): print('%s - %.4f, $%.2f USD, %.2f shares' % (dataDF.columns[ticker_id], allocations[ticker_id] 100, total_portfolio_value allocations[ticker_id], (total_portfolio_value * allocations[ticker_id]) / prices))

print_allocation(train, solution, data[ticker, 'Close'].iloc[-1])

OUTPUT = AAPL - 100.0000, $42928.00 USD, 1963.77 shares BRK-B - 0.0000, $0.00 USD, 0.00 shares GS - 0.0000, $0.00 USD, 0.00 shares JPM - 0.0000, $0.00 USD, 0.00 shares RVLV - 0.0000, $0.00 USD, 0.00 shares

cstampar commented 4 years ago

I reran it with a population of 10,000, took considerably longer, and still produced same results: all in on AAPL. So it doesn't seem to be a problem with the size of the simulation and it getting stuck in a local optimum. It also still only produces a single solution on the plot. It also doesn't seem to be a problem with the calculation in print_allocation with my own price dataframe, because when you run: for ticker in np.argsort(-solution): print(solution[ticker]) output= 1.0, 0.0, 0.0, 0.0, 0.0

so that is definitely what the model is computing for some reason.

I removed Apple to see if that was doing anything, and the model just seems to give 100% allocation to whatever asset it see's first.

marekgalovic / optfolio

Hey! Need some help with Optimize script #1