haotianteng / Xron

Xron - an omni basecaller for ONT reads.
GNU General Public License v3.0
18 stars 3 forks source link

xron call Produces Empty FASTQ Directories #7

Open BrendanBeahan opened 2 days ago

BrendanBeahan commented 2 days ago

Hello,

I am attempting to run xron for basecalling, but I am encountering an issue where the output directories are empty. Below are the details of my setup and the steps I followed.

Built a Singularity Image Format (SIF) file using the following Dockerfile:

FROM python:3.8-slim

ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    wget \
    curl \
    zlib1g-dev \
    unzip && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

RUN pip install xron

RUN xron init

CMD ["bash"]

Converted the Docker image into a SIF file and ran the following command within the Singularity container:

xron call \
  -i /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/pod5_scp/OHMX20240022_001/pod5_pass/ \
  -o /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/xRon/results_2 \
  -m models/RNA004

Observed the following output:

No GPU is detected, the batch_size is setting to default 1200
Construct and load the model.
/usr/local/lib/python3.8/site-packages/xron/xron_eval.py:32: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(os.path.join(model_folder,latest_ckpt),
Begin basecall.
0it [00:00, ?it/s]
NN_time:0.000000,assembly_time:0.000000,writing_time:0.000053

The output directory contains no .fastq files, and nothing appears to have been processed.

Also, the input directory /rhea/scratch/brussel/vo/000/bvo00030/vsc11010/pod5_scp/OHMX20240022_001/pod5_pass/ contains valid .pod5 files.

Finally, I noticed a warning about torch.load in the output:

FutureWarning: You are usingtorch.loadwithweights_only=False...

Please let me know if there’s any additional information you need to help troubleshoot this issue. Thank you for your assistance!

Best, Brendan

abcdtree commented 1 day ago

@BrendanBeahan I am having the same issue. I found this in code

parser.add_argument('--input_format', default = "fast5",help = "The input file format, defautl is pod5 file, can be fast5.")

While it is said the default is pod5, it is actually fast5 format. So I am trying to add --input_format pod5 to have another try. Will let you know whether it works or not.

@BrendanBeahan just to update, this does not fix the issue

BrendanBeahan commented 1 day ago

@abcdtree Thanks Josh!