qiaochen / VeloAE

Low-dimensional Projection of Single Cell Velocity
MIT License
28 stars 1 forks source link

error at def expBaseAE(adata, exp_metrics): #4

Open Roger-GOAT opened 2 years ago

Roger-GOAT commented 2 years ago

Hi dear team, thanks for the package. I get an error could you mind give some tips.

def expBaseAE(adata, exp_metrics):
    n_cells, n_genes = adata.X.shape
    in_dim = n_genes
    z_dim = args.z_dim
    h_dim = args.h_dim

    model = get_baseline_AE(in_dim, z_dim, h_dim).to(device)
    model = main_AE(args, model, save_name=f"baseAE_{args.model_name}")
    model.eval()
    with torch.no_grad():
        x = model.encoder(tensor_x)
        s = model.encoder(tensor_s)
        u = model.encoder(tensor_u)

        v = estimate_ld_velocity(s, u, device=device).cpu().numpy()
        x = x.cpu().numpy()
        s = s.cpu().numpy()
        u = u.cpu().numpy()

    adata = new_adata(adata, x, s, u, v, g_basis=args.nb_g_src)
    scv.tl.velocity_graph(adata, vkey='new_velocity')

    scv.pl.velocity_embedding_stream(adata, vkey="new_velocity", basis='X_umap', color='leiden',
                                    title="Baseline AutoEncoder",
                                    )  
    scv.tl.velocity_confidence(adata, vkey='new_velocity')
    exp_metrics['Baseline AutoEncoder'] = evaluate(adata, cluster_edges, 'leiden', "new_velocity")

expBaseAE(adata, exp_metrics)

Train Epoch: 100/20000  Loss: 58.533680
Train Epoch: 200/20000  Loss: 58.349663
Train Epoch: 300/20000  Loss: 58.188026
Train Epoch: 400/20000  Loss: 58.048077
Train Epoch: 500/20000  Loss: 57.929665
.......
Train Epoch: 11400/20000    Loss: 20.946295
Train Epoch: 11500/20000    Loss: 20.501766
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [10], in <module>
     26     scv.tl.velocity_confidence(adata, vkey='new_velocity')
     27     exp_metrics['Baseline AutoEncoder'] = evaluate(adata, cluster_edges, 'leiden', "new_velocity")
---> 29 expBaseAE(adata, exp_metrics)

Input In [10], in expBaseAE(adata, exp_metrics)
      5 h_dim = args.h_dim
      7 model = get_baseline_AE(in_dim, z_dim, h_dim).to(device)
----> 8 model = main_AE(args, model, save_name=f"baseAE_{args.model_name}")
      9 model.eval()
     10 with torch.no_grad():

Input In [6], in main_AE(args, model, lr, weight_decay, save_name)
      9 while i < args.n_epochs:
     10     i += 1
---> 11     loss = train_step_AE([tensor_s, tensor_u], model, optimizer, xyids=[0, 1], device=device)
     12     losses.append(loss)
     13     if i % args.log_interval == 0:

File ~/miniconda3/envs/velo/lib/python3.8/site-packages/veloproj/util.py:370, in train_step_AE(Xs, model, optimizer, xyids, device, aux_weight, rt_all_loss, perc, norm_lr)
    367     lr_loss = vloss.item()
    368     loss += vloss
--> 370 loss.backward()
    371 optimizer.step()
    372 if rt_all_loss:

File ~/miniconda3/envs/velo/lib/python3.8/site-packages/torch/_tensor.py:307, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    298 if has_torch_function_unary(self):
    299     return handle_torch_function(
    300         Tensor.backward,
    301         (self,),
   (...)
    305         create_graph=create_graph,
    306         inputs=inputs)
--> 307 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

File ~/miniconda3/envs/velo/lib/python3.8/site-packages/torch/autograd/__init__.py:154, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    151 if retain_graph is None:
    152     retain_graph = create_graph
--> 154 Variable._execution_engine.run_backward(
    155     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    156     allow_unreachable=True, accumulate_grad=True)

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100]], which is output 0 of IndexPutBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
qiaochen commented 2 years ago

Thank you! It seems the vanila AE is projecting the input matrices into spurious representations that resulted in "NaN" values when fitting the low-dimensional linear regression. The code in current line 427 of the model.py filters the nan values by assigning 0 values: offset[torch.isnan(offset)], gamma[torch.isnan(gamma)] = 0, 0

I guess the error is raised by this line of code. The good news is that a safer nan filtering function (i.e., torch.nan_to_num) has been provided by pytorch since version >= 1.8, updated in veloAE with the hope to tackle the issue.

Updated the codes with the following nan filtering operations in model.py:

 if torch.any(nans_offset) or torch.any(nans_gamma):
        version_1_8 = sum([int(this) >= that for this,that in zip(torch.__version__.split('.')[:2], [1, 8])]) == 2
        if version_1_8:
            offset = torch.nan_to_num(offset)
            gamma  = torch.nan_to_num(gamma)
        else:
            offset = torch.where(nans_offset, torch.zeros_like(offset), offset)
            gamma  = torch.where(nans_gamma, torch.zeros_like(gamma), gamma)

Hope it helps!