Error with Custom Network

avacaondata commented 3 years ago

Hi, I'm trying to implement a Network similar to ThorpNet but using Resample as the allocation layer:

class ResampleNetwork(torch.nn.Module, Benchmark):
    def __init__(
        self, 
        n_assets,
        max_weight=0.2,
        force_symmetric=True,
        n_clusters=5,
        n_init=10,
        init="random",
        random_state=None
    ):
        super().__init__()
        self.force_symmetric = force_symmetric
        self.matrix = torch.nn.Parameter(torch.eye(n_assets), requires_grad=True)
        self.exp_returns = torch.nn.Parameter(torch.zeros(n_assets), requires_grad=True)
        self.covariance_layer = CovarianceMatrix(sqrt=False, shrinkage_strategy="diagonal")
        self.collapse_layer = AverageCollapse(collapse_dim=3)
        self.portfolio_opt_layer =  Resample(
            allocator=NCO(n_clusters=n_clusters, n_init=n_init, init=init, random_state=random_state),
            n_draws=100,
            n_portfolios=50,
        ) 

    def forward(self, x):
        n = len(x)

        covariance = torch.mm(self.matrix, torch.t(self.matrix)) if self.force_symmetric else self.matrix
        exp_returns_all = torch.repeat_interleave(self.exp_returns[None, ...], repeats=n, dim=0)

        covariance_all = torch.repeat_interleave(covariance[None, ...], repeats=n, dim=0)
        weights = self.portfolio_opt_layer(covariance_all, exp_returns_all)
        return weights

However, I'm always getting the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-125-c9e2da82531f> in <module>
     11           device = torch.device("cuda:0"))#, optimizer=torch.optim.Adam(mynet.parameters(), lr=0.01))
     12 
---> 13 history = run.launch(n_epochs=4)

~/miniconda/envs/series_temporales/lib/python3.7/site-packages/deepdow/experiments.py in launch(self, n_epochs)
    268                     loss = loss_per_sample.mean()
    269                     self.optimizer.zero_grad()
--> 270                     loss.backward()
    271                     self.optimizer.step()
    272 

~/miniconda/envs/series_temporales/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    219                 retain_graph=retain_graph,
    220                 create_graph=create_graph)
--> 221         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    222 
    223     def register_hook(self, hook):

~/miniconda/envs/series_temporales/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    130     Variable._execution_engine.run_backward(
    131         tensors, grad_tensors_, retain_graph, create_graph,
--> 132         allow_unreachable=True)  # allow_unreachable flag
    133 
    134 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

which I don't understand, as I'm not accessing any element 0 of any tensor...

Could you please show any code example of how to use Resample or NCO allocators inside a Network?? @jankrepl @turmeric-blend

jankrepl commented 3 years ago

Thank you for creating an issue!

The solution should be really simple. Note that the above is a result of a known bug that was fixed in #106 . I haven't released a new version (+ upload to PyPI) that would already contain the commit with the bug fix.

I suggest you just install deepdow directly from master of this github repository

git clone https://github.com/jankrepl/deepdow.git
cd deepdow
pip install .

and I will try to release a new version ASAP such that people running pip install deepdow (using PyPI) do not run into this error anymore.

Let me know whether it worked:) @alexvaca0

avacaondata commented 3 years ago

GREAT!!! :D That worked! Thank you so much @jankrepl One more thing, could I ask you how could I modify the current ThorpNet so that portfolio weights get slightly adjusted based on external variables ? (to expand the current setup in which we get the same weights always)

jankrepl commented 3 years ago

GREAT!!! :D That worked! Thank you so much @jankrepl

No problem:) Feel free to close this issue:)

One more thing, could I ask you how could I modify the current ThorpNet so that portfolio weights get slightly adjusted based on external variables ? (to expand the current setup in which we get the same weights always)

Good question! I would refer you to the docs to get some inspiration. However, I would definitely encourage you to be creative and write a custom forward pass (that is the philosophy of deepdow anyway).

The ThorpeNet was implemented mainly to demonstrate the possibility of having no "external" variables. An implication of having a network like this is that once trained all weights are going to be fixed and therefore the predictions are going to be identical!

jankrepl commented 3 years ago

Just released a new version v0.2.2 and it is available on PyPI. So you should be able to get it via

pip install --upgrade deepdow

avacaondata commented 3 years ago

The thing is that all layers assume that X and y have the same number of assets, that's why I was asking for external features use. Suppose the following case, I want to use some indexes, interest rates and indicators as external variables to my model, but I don't want to use assets themselves because I will not have them in test time (in a competition, for example). What would be the best way to create a train_dataloader such that it uses the features I mentioned above for X, for learning, and the prices of the assets as y for Sharpe Ratio computation? My objective is to be able to extract the powerful capacity of ThorpeNet, such that it's not neccessary to have price data in test time, while taking advantage of other information that I may find available, such as economical indicators etc. @jankrepl

jankrepl commented 3 years ago

I hope I did not misunderstand but IMO if you know that you won't have access to some of your features at inference time then including them at training does not really make sense. With that being said, you can have a part of your network depend on input features and another part just depend some trainable parameters (as in ThorpeNet) that do not depend on the input features at all:)

Note that some deepdow layers support dynamic change of n_assets. So you can create architectures that where each batch is going to have a different n_assets. Let me give you multiple examples:

SoftmaxAllocator when formulation="analytical" https://deepdow.readthedocs.io/en/latest/source/api/deepdow.layers.allocate.html#deepdow.layers.allocate.SoftmaxAllocator
The *Collapse layers https://deepdow.readthedocs.io/en/latest/source/api/deepdow.layers.collapse.html
You can write an RNN that runs along your assets dimension and summarizes all of them with some final hidden state

Note that there is the FlexibleDataLoader that yields batches that can have varying n_assets.

avacaondata commented 3 years ago

Actually, what I mean is that what I won't have are asset prices at test time, but I can have external features like interest rates at test time. For that reason, what I'd want is to use just the y (prices from t+1 to t+horizon) from assets at training time, to compute the Sharpe Ratio Loss, but as features I want to use external variables that I can have at training time (interest rates, economic variables etc.), therefore tensors X and y won't have the same number of assets, as the former has external variables and the latter has asset prices. All batches will have the same "number of assets", although X and y don't. Let me provide an example: Let's say I have 100 assets, with 4 features each, for 12000 timesteps. If I have a 24 timesteps lookback and 24 horizon with gap = 1, the shape of X is (12000, 4, 24, 100), and shape of y is (12000, 4, 24, 100). On the other hand, I have 4 external indicators, that I want to use to modify the portfolio at each timestep (therefore I have them in the same timesteps as I have assets). If I use these indicators to create tensors, they'd be: tensor X : (12000, 1, 24, 4), y: (12000, 1, 24, 4). What I'd want is to use the second X (12000, 1, 24, 4) with the first y: (12000, 4, 24, 100). If I understood well then I could have a network in which some layers act only on x and other layers act on some trainable parameters, am I right? Given the way the library is created, what I don't know is if it would make sense, for example, to use a LSTM or other layer to extract features from the external variables, then use the output features to combine them with the trainable Parameter expected_returns to have a final expected_returns that depends on external features, and then use this final expected_returns to get the weights with a portfolio optimization layer... What do you think? Other thing that I think may cause problems is that possibly not all external variables have the same number of channels (in some cases I may have bid, ask, price, in others OHLCV...)... would it be possible to create tensors in which channels don't coincide? (I'm guessing not, but just to ensure - in the example I assumed I could only use the price due to that mismatch between external variables' channels). Btw, thank you very much for all the advice and help, I really appreciate that. If any of these experiments I'm carrying out turns out to work, at the end of the competition I'll make a PR to include them in the library so that everyone can use them. @jankrepl

jankrepl commented 3 years ago

Let me just list a few ideas and I hope some of them could be useful

I would set n_assets = 100 + 4 = 104 Then you can either write a custom allocation layer that makes it impossible to invest in the 4 indicators. Specifically, you can modify NumericalMarkowitz and add a constraint to the optimization problem that w_99=0, w_100=0,.... The other option is to take an existing allocation layer and after you run it you would just rescale the weight tensor. You would set the weights of the 4 indicators to zero and then just divide by the sum of the remaining weights so that it sums up to 1 again. See the below pseudocode.
```
w[:, [99, 100, 101, 102]] = 0
w /= w.sum(axis=1, keepdim=True)
```
regarding different number of channels, you can always just create equally shaped tensors by padding any dimensions with zeros. And then you can have some special treatment of this padded portion of your tensor in the forward pass.

jankrepl commented 3 years ago

Closing due to inactivity. If you have any further comments I can always reopen.

jankrepl / deepdow

Error with Custom Network #110