lbcb-sci / herro

HERRO is a highly-accurate, haplotype-aware, deep-learning tool for error correction of Nanopore R10.4.1 or R9.4.1 reads (read length of >= 10 kbps is recommended).
Other
136 stars 9 forks source link

Core dumped at herro inference #36

Open sivico26 opened 1 month ago

sivico26 commented 1 month ago

Hello there,

I am trying to herro with quite a big dataset (4 Gb plant genome, ~72x depth). I already did the AvA step, But now I am struggling in the inference step:

The command I am using is:

herro="$herro_dir/herro.sif"   ## herro_dir -> /path/to/herro_cloned_repository
mnt_alns="/data/out_mappings"
mnt_reads="/data/ont_reads.fastq.gz"

singularity run --nv $herro inference -t 64 -m /herro/model_v0.1.pt --read-alns $mnt_alns -b 128 $mnt_reads /results/corrected_reads.fasta

But I am getting this error

Error log ```raw thread '' panicked at src/inference.rs:172:64: called `Result::unwrap()` on an `Err` value: Torch("The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File \"code/__torch__/model.py\", line 31, in forward target_positions: List[Tensor]) -> Tuple[Tensor, Tensor]: embedding = self.embedding bases_embeds = (embedding).forward(bases, ) ~~~~~~~~~~~~~~~~~~ <--- HERE _0 = [bases_embeds, torch.unsqueeze(qualities, -1)] x = torch.cat(_0, -1) File \"code/__torch__/torch/nn/modules/sparse.py\", line 18, in forward _0 = __torch__.torch.nn.functional.embedding weight = self.weight _1 = _0(input, weight, 11, None, 2., False, False, ) ~~ <--- HERE return _1 File \"code/__torch__/torch/nn/functional.py\", line 37, in embedding else: input0 = input _3 = torch.embedding(weight, input0, padding_idx0, scale_grad_by_freq, sparse) ~~~~~~~~~~~~~~~ <--- HERE return _3 def batch_norm(input: Tensor, Traceback of TorchScript, original code (most recent call last): File \"/raid/scratch/stanojevicd/projects/haec-BigBird/model.py\", line 118, in forward ''' # (batch_size, sequence_length, num_alignment_rows, bases_embedding_size) bases_embeds = self.embedding(bases) ~~~~~~~~~~~~~~ <--- HERE # concatenate base qualities to embedding vectors File \"/home/stanojevicd/miniforge3/envs/haec/lib/python3.11/site-packages/torch/nn/modules/sparse.py\", line 162, in forward def forward(self, input: Tensor) -> Tensor: return F.embedding( ~~~~~~~~~~~ <--- HERE input, self.weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse) File \"/home/stanojevicd/miniforge3/envs/haec/lib/python3.11/site-packages/torch/nn/functional.py\", line 2233, in embedding # remove once script supports set_grad_enabled _no_grad_embedding_renorm_(weight, input, max_norm, norm_type) return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) ~~~~~~~~~~~~~~~ <--- HERE RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ```

Given the final lines of the log, I thought it was a stochastic error, but I ran it again and got the same, so it seems consistent. Do you have any idea of what could be happening?

Thanks in advance.