Error during training. - Githubissues

sharsnik2 commented 5 years ago

I'm getting an error when training a slightly modified set up to the basic Lorenz example.

Here's the error message:

Task lfads_param_bqMkwl_run001_single_dataset001: Decreasing learning rate to 0.007690.
Task lfads_param_bqMkwl_run001_single_dataset001: Decreasing learning rate to 0.007536.
Task lfads_param_bqMkwl_run001_single_dataset001: Decreasing learning rate to 0.007386.
Task lfads_param_bqMkwl_run001_single_dataset001: TERMINATED UNEXPECTEDLY. Final output:
  File "/data/connor/anaconda3/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/data/connor/anaconda3/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/data/connor/anaconda3/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had Inf values
         [[node LFADS/VerifyFinite/CheckNumerics (defined at /data/connor/LFADS/models/research/lfads/lfads.py:946) ]]

Please let me know if I need to upload other log files.

Also, I modified the experiment parameters as follows:

 p.addParameter('nDatasets', 1, @isscalar);
    p.addParameter('minChannels', 25, @isscalar);
    p.addParameter('maxChannels', 35, @isscalar);
    p.addParameter('nConditions', 200, @isscalar);
    p.addParameter('minTrialsC', 1, @isscalar); % per condition
    p.addParameter('maxTrialsC', 1, @isscalar);
    p.addParameter('nTime', 1000, @isscalar);
    p.addParameter('meanFr', 5, @isscalar);
    p.addParameter('T_burnIn', 50000, @isscalar);

I also had to change this line to get the above code to run: par.c_batch_size = 1; % must be < 1/5 of the min trial count for trainToTestRatio == 4

cpandar commented 5 years ago

Hi Sharsnik - without knowing the details of why you're getting Nan gradients, I'd just say I've never run LFADS with such a small batch size (1 trial). Could you explain why you're constrained to such a small batch?

djoshea commented 5 years ago

Yeah, I'm not sure changing the batch_size that low has ever been tested, that might be related. It's also not clear to me why you'd get so few trials? You should set the batch_size to be just below 1/5th of your trial count, so unless you only have 5 trials (I'm guessing based on the Lorenz code and your defaults, you'd have 200).

The other issue might be that the alignment matrices are really badly formed. You've modified the example to use only 1 trial per condition, which means that the trial averaging it usually uses might be super noisy. On the other hand, it's only using a single dataset, so it shouldn't be generating them, so I'm guessing that's not the issue.

Are you able to get things working with the Lorenz attractor with the default parameters?

sharsnik2 commented 5 years ago

The Lorenz test did work with the default parameters.

Perhaps you guys can give some insight as to how/if LFADS can be modified to solve my specific problem.

I assume that I have a single, long trace of a observations from a single dynamical system. In the Lorenz set-up, this amounts to simulating the system for a long time. In practice, this would be something like a neural network being stimulated by inputs (or, in the case of Lorenz, driven by constant noise). I also assume that this system is recurrent, so that it traces out similar trajectories in space at different times. My goal, is to approximate the dynamics that give rise to the observations.

What I was trying to do with the Lorenz test, was to take many shorter snippets of the observations. My hope, is that the decoder RNN of LFADS would learn a model of the underlying dynamics. This is why I can only have one trial per condition.

Perhaps there is another way to set up this problem for LFADS?

lyprince commented 5 years ago

@sharsnik2 I haven't tried this, but it was on my to-do list: You could split up the long trace into segments, then infer the initial conditions only for the first segment, and set the initial conditions for each subsequent segment to be the last state of the previous segment, like in truncated backprop through time.

@cpandar @djoshea do you know if that works for long recordings?

djoshea commented 5 years ago

Sorry for the delayed reply. Yes, that's what I'd recommend, splitting the data into even-length segments of a manageable size. We've done this with reaching data ourselves, ignoring the trial-structure of the task and just chopping it into short segments.

sharsnik2 commented 5 years ago

Splitting the long time series into chunks makes sense to me, but how would I set that up in LFADS? I would still only have 1 trial per condition, right? How would I manually set the initial conditions of the new trials as @lyprince suggested?

djoshea commented 5 years ago

@sharsnik2, I'm not sure what you mean by "how would I set that up in LFADS". The condition labels aren't used by LFADS proper (Tensorflow code in Python) at all. The condition labels are only used in my code when using dynamical stitching across multiple datasets, since it helps to establish a correspondence of different trial types across datasets. They are used in generating the alignment matrices that are used as the initial value of the readout matrices (and readin matrices, via a transpose). So if the code is working correctly, the condition labels shouldn't be used if you're only using one LFADS.Datasetfor a given model. Is this correct? Are you trying to do stitching at the same time? Or are you seeing the alignment matrix code being executed even with only one dataset?

As for @lyprince's suggestion, it's definitely an interesting idea, and I've thought about continuity across trial boundaries before, especially when you think there is a slow-timescale signal that changes on a multi-trial timescale. We haven't actually baked this into LFADS, which has the advantage of allowing you to plot the I.C.'s and see if it discovered this structure across trials on its own (e.g. using t-SNE). But we haven't engineered in a way to do what you're getting at, which is essentially asking the RNN to generate the full dynamical trajectory over all trials from a single initial condition. One clever way to optimize over this in a tractable way would be, as you suggested, to encourage continuity of initial conditions across "trial" segments. There's a wrinkle there in that the state of the RNN typically has a larger dimension than the I.C., which is typically mapped onto the GRU states via a linear expansion, but this should be easy to work around working in the space of RNN-states.

Anyway, I think it's a great idea, but it's not something built into LFADS or coded up anywhere I know of, so you'd have to dig into the Tensorflow code to implement this. On the other hand, you probably don't need to worry about it at all, and hope that if LFADS learns the dynamical structure of your data, it might establish this kind of approximate continuity across trial boundaries even without asking it to.

lfads / lfads-run-manager

Error during training. #16