How to prepare NLB data for NDT ?

HilbertHuangHitomi commented 3 years ago

I have successfully followed nlb_tools to read NLB datasets successfully, but I noticed NDT needs h5 files which are not the same as my h5 saves. How should I prepare the dataset I downloaded from DANDI for running NDT, please?

joel99 commented 3 years ago

cc: @felixp8 -- what's the difference b/n the files that you handed off vs the ones that the latest nlb uses?

felixp8 commented 3 years ago

I don't believe I've changed the file format since then. It looks like this function expects the keys to be 'train_data_heldin', etc., though nlb_tools uses 'train_spikes_heldin' and so on. Did you change those manually at some point @joel99?

HilbertHuangHitomi commented 3 years ago

Here's my working procedure.

modify XXX_data_XXXX to XXX_spikes_XXXX in src/dataset.py.

if 'eval_spikes_heldin' in h5dict: # NLB data
get_key = lambda key: h5dict[key].astype(np.float32)
train_data = get_key('train_spikes_heldin')
train_data_fp = get_key('train_spikes_heldin_forward')
train_data_heldout_fp = get_key('train_spikes_heldout_forward')
train_data_all_fp = np.concatenate([train_data_fp, train_data_heldout_fp], -1)
valid_data = get_key('eval_spikes_heldin')
train_data_heldout = get_key('train_spikes_heldout')
if 'eval_spikes_heldout' in h5dict:
    valid_data_heldout = get_key('eval_spikes_heldout')
else:
    valid_data_heldout = np.zeros((valid_data.shape[0], valid_data.shape[1], train_data_heldout.shape[2]), dtype=np.float32)
if 'eval_spikes_heldin_forward' in h5dict:
    valid_data_fp = get_key('eval_spikes_heldin_forward')
    valid_data_heldout_fp = get_key('eval_spikes_heldout_forward')
    valid_data_all_fp = np.concatenate([valid_data_fp, valid_data_heldout_fp], -1)
else:
    valid_data_all_fp = np.zeros(
        (valid_data.shape[0], train_data_fp.shape[1], valid_data.shape[2] + valid_data_heldout.shape[2]), dtype=np.float32
    )

# NLB data does not have ground truth rates
if mode == DATASET_MODES.train:
    return train_data, None, train_data_heldout, train_data_all_fp
elif mode == DATASET_MODES.val:
    return valid_data, None, valid_data_heldout, valid_data_all_fp

use nlb_tools read nwb data and save as h5 with something like:

train_dict = make_train_input_tensors(
dataset,
dataset_name  = 'mc_maze_small',
trial_split = 'train',
include_behavior = True,
include_forward_pred = True,
)
eval_dict = make_eval_input_tensors(
dataset,
dataset_name  = 'mc_maze_small',
trial_split = 'val',
)

merge them with:

data_dict = {
'eval_spikes_heldin'  : eval_dict['eval_spikes_heldin'],
'eval_spikes_heldout' : eval_dict['eval_spikes_heldout'],
'train_spikes_heldin'          : train_dict['train_spikes_heldin'],
'train_spikes_heldout'         : train_dict['train_spikes_heldout'],
'train_behavior'               : train_dict['train_behavior'],
'train_spikes_heldin_forward'  : train_dict['train_spikes_heldin_forward'],
'train_spikes_heldout_forward' : train_dict['train_spikes_heldout_forward'],
}
save_to_h5(data_dict, os.path.join('./data/mc_maze_small.h5'))

specify data path in ./config./mc_maze_small.yaml

DATA:
DATAPATH: "./data"
TRAIN_FILENAME: 'mc_maze_small.h5'
VAL_FILENAME: 'mc_maze_small.h5'

However, I got the following issue:

removing ./Results/logs/mc_maze_small
2021-10-14 09:18:01,907 Using 1 GPUs
2021-10-14 09:18:01,946 Using cuda:1
2021-10-14 09:18:01,946 Loading mc_maze_small.h5 in train
2021-10-14 09:18:02,155 Clipping all spikes to 7.
2021-10-14 09:18:02,155 Training on 75 samples.
2021-10-14 09:18:02,156 Loading mc_maze_small.h5 in val
2021-10-14 09:18:10,835 number of trainable parameters: 682538
0%|                                                                                                                                      | 0/50501 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1587428091666/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of add_ is deprecated:
    add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
    add_(Tensor other, *, Number alpha)
0%|                                                                                                                                      | 0/50501 [00:01<?, ?it/s]
Traceback (most recent call last):
File "src/run.py", line 144, in <module>
main()
File "src/run.py", line 58, in main
run_exp(**vars(args))
File "src/run.py", line 137, in run_exp
runner.train()
File "/home/username/Projects/neural-data-transformers/src/runner.py", line 341, in train
metrics = self.train_epoch()
File "/home/username/Projects/neural-data-transformers/src/runner.py", line 482, in train_epoch
eval_r2 = self.neuron_r2(rates, pred_rates)
File "/home/username/Projects/neural-data-transformers/src/runner.py", line 749, in neuron_r2
gt, pred = self._clean_rates(gt, pred, **kwargs)
File "/home/username/Projects/neural-data-transformers/src/runner.py", line 737, in _clean_rates
raise Exception(f"Incompatible r2 sizes, GT: {gt.size()}, Pred: {pred.size()}")
Exception: Incompatible r2 sizes, GT: torch.Size([25, 35, 107]), Pred: torch.Size([25, 45, 142])

since nlb datasets have no gt rates, in runner I commented
```
#eval_r2 = self.neuron_r2(rates, pred_rates)
#metrics_dict['eval_r2'] = eval_r2
```
Now it seems to run smoothly.

snel-repo / neural-data-transformers

How to prepare NLB data for NDT ? #3