nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
439 stars 53 forks source link

Pod5 files corrupted? #880

Closed mbacino closed 2 weeks ago

mbacino commented 2 weeks ago

Dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v4.3.0 is stuck in queue of HPC. I uploaded pod5 files from my Minknow run to my university's HPC by zipping the pod5 folder and the unzipping it once it was in the correct directory. I am using a script that has previously worked with a different directory of pod5 files so I assume the issue is the input pod5 files. Could compressing the pod5 files be the issue? This is the script I am running 24_06_06

#!/bin/bash
#$ -N dorado-job  ## job name
#$ -cwd           ## use current working directory
#$ -j yes         ## merge stdout and stderr
#$ -q gpu.q       ## specify the GPU queue
#$ -l h_rt=24:00:00  # 1 day runtime
#$ -l gpu=1         # Ensure GPU request is specified if needed
##$ -e $HOME
##$ -o $HOME

# Print information about the queue and GPU assignment
echo "QUEUE: $QUEUE"
echo "SGE_GPU: $SGE_GPU"

# Set CUDA_VISIBLE_DEVICES to control GPU visibility
export CUDA_VISIBLE_DEVICES=$SGE_GPU

# Print the start timestamp
t0=$(date --rfc-3339=seconds)
echo "Job started at: $t0"

# Navigate to the directory where dorado is located
cd /path to directory/dorado-0.6.0-linux-x64/bin

# Execute dorado
./dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v4.3.0 /path to directory/input_files/pod5/ \
--kit-name SQK-RBK114-96 \
--no-trim \
--sample-sheet /path to directory/input_files/24_05_28_ss.csv > /path to directory/output_files/24_06_06_calls.bam

# Print the end timestamp
t1=$(date --rfc-3339=seconds)
echo "Job ended at: $t1"

Pod5 file directory total 48G -rwx------. 1 ms-bacino lynch 1.3G Jun 3 13:57 FAZ22417_f962e43c_8ea0ac62_0.pod5 -rwx------. 1 ms-bacino lynch 2.9G Jun 3 13:57 FAZ22417_f962e43c_8ea0ac62_10.pod5 -rwx------. 1 ms-bacino lynch 2.5G Jun 3 13:57 FAZ22417_f962e43c_8ea0ac62_11.pod5 -rwx------. 1 ms-bacino lynch 2.9G Jun 3 13:58 FAZ22417_f962e43c_8ea0ac62_12.pod5 -rwx------. 1 ms-bacino lynch 2.3G Jun 3 13:58 FAZ22417_f962e43c_8ea0ac62_13.pod5 -rwx------. 1 ms-bacino lynch 2.8G Jun 3 13:58 FAZ22417_f962e43c_8ea0ac62_14.pod5 -rwx------. 1 ms-bacino lynch 2.7G Jun 3 13:58 FAZ22417_f962e43c_8ea0ac62_15.pod5 -rwx------. 1 ms-bacino lynch 2.1G Jun 3 13:58 FAZ22417_f962e43c_8ea0ac62_16.pod5 -rwx------. 1 ms-bacino lynch 2.4G Jun 3 13:58 FAZ22417_f962e43c_8ea0ac62_17.pod5 -rwx------. 1 ms-bacino lynch 2.4G Jun 3 13:59 FAZ22417_f962e43c_8ea0ac62_18.pod5 -rwx------. 1 ms-bacino lynch 2.0G Jun 3 13:59 FAZ22417_f962e43c_8ea0ac62_19.pod5 -rwx------. 1 ms-bacino lynch 2.9G Jun 3 13:57 FAZ22417_f962e43c_8ea0ac62_1.pod5 -rwx------. 1 ms-bacino lynch 795M Jun 3 13:59 FAZ22417_f962e43c_8ea0ac62_20.pod5 -rwx------. 1 ms-bacino lynch 2.5G Jun 3 13:59 FAZ22417_f962e43c_8ea0ac62_2.pod5 -rwx------. 1 ms-bacino lynch 3.2G Jun 3 13:59 FAZ22417_f962e43c_8ea0ac62_3.pod5 -rwx------. 1 ms-bacino lynch 3.3G Jun 3 13:59 FAZ22417_f962e43c_8ea0ac62_4.pod5 -rwx------. 1 ms-bacino lynch 2.6G Jun 5 09:01 FAZ22417_f962e43c_8ea0ac62_5.pod5 -rwx------. 1 ms-bacino lynch 3.1G Jun 5 09:21 FAZ22417_f962e43c_8ea0ac62_6.pod5 -rwx------. 1 ms-bacino lynch 3.0G Jun 5 09:41 FAZ22417_f962e43c_8ea0ac62_7.pod5 -rwx------. 1 ms-bacino lynch 2.6G Jun 5 10:01 FAZ22417_f962e43c_8ea0ac62_8.pod5

HalfPhoton commented 2 weeks ago

Hi @mbacino,

Do you see any error messages?

Can you basecall any of this data locally?

Can you explain what you mean by "could compressing the pod5 files be the issue?"?

Rich

mbacino commented 2 weeks ago

I didn’t receive any error messages because the dorado job is stuck in the HPC job queue. My theory is that the pod5 files aren’t formatted correctly so dorado will not run. When I uploaded the pod5 files from my hard drive to the HPC I zipped the folder because each file is over a Gb and would take a long time to upload. Could compressing pod5 files cause them to become corrupted? I can’t run dorado locally because my computer isn’t powerful enough. Thanks, Margot

Get Outlook for iOShttps://aka.ms/o0ukef


From: Richard Harris @.> Sent: Tuesday, June 11, 2024 2:58:25 AM To: nanoporetech/dorado @.> Cc: Bacino, Margot @.>; Mention @.> Subject: Re: [nanoporetech/dorado] Pod5 files corrupted? (Issue #880)

Hi @mbacino, Do you see any error messages? If so - please share them so we can help identify the issue. Can you basecall any of this data locally? If so - It's unlikely an issue with pod5. Can you explain what you mean by "could compressing ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi @mbacinohttps://urldefense.com/v3/__https://github.com/mbacino__;!!LQC6Cpwp!r_mWgSmb2X4BOKMGi6Lib6UvKfEUwYlPpDWWor8jepK_o4qsihqZ2pXAo8CdS6WqODApFyY98MimWR60uj-O8MO5DTng0w$,

Do you see any error messages?

Can you basecall any of this data locally?

Can you explain what you mean by "could compressing the pod5 files be the issue?"?

Rich

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/nanoporetech/dorado/issues/880*issuecomment-2160325323__;Iw!!LQC6Cpwp!r_mWgSmb2X4BOKMGi6Lib6UvKfEUwYlPpDWWor8jepK_o4qsihqZ2pXAo8CdS6WqODApFyY98MimWR60uj-O8MPYdarMUA$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ASLWF4EUDHBES34KPFYPETDZG3C4DAVCNFSM6AAAAABJDEJQE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGMZDKMZSGM__;!!LQC6Cpwp!r_mWgSmb2X4BOKMGi6Lib6UvKfEUwYlPpDWWor8jepK_o4qsihqZ2pXAo8CdS6WqODApFyY98MimWR60uj-O8MNPu5PkPw$. You are receiving this because you were mentioned.Message ID: @.***>

HalfPhoton commented 2 weeks ago

Assuming you unzipped the file on the other side that should be fine. Dorado cannot basecall zipped pod5s though.

Also - I'm surprised you gained much compression zipping pod5s. They're already efficiently compressed and I wouldn't have expected it to make much difference. In a local test I got 208M -> 207M.

Are you sure that your data has transferred correctly?

I can't really help you with your stuck job on UGE without more information. Contact your HPC admin to help recover some logs or information that would be helpful.

mbacino commented 2 weeks ago

One of my pod5 files was corrupted and there was an error in my job submission script. Re uploading the pod5 files and editing my script resolved the issue.