vetmohit89 / NanoPsiPy

GNU General Public License v3.0
3 stars 0 forks source link

fast5 data only contains 4000 reads #2

Closed ssscj closed 2 months ago

ssscj commented 5 months ago

Hi Mohit, Thanks for developing NanoPsipy. I download the fast5 files under the accession number PRJNA961708. The sizes of BE2C_shGFP_1, BE2C_shGFP_2, BE2C_shPUS7_1, BE2C_shPUS7_2 fast5 fiiles are 19Gb, 42Gb, 36Gb and 45 Gb respectively, but I only got 4000 reads from each file through guppy basecalling or h5dump or dorado basecalling. I found that other people also met the same problem like https://github.com/nanoporetech/dorado/issues/293 or https://github.com/nanoporetech/ont_fast5_api/issues/55 . How did you process the fast5 files? Thanks. PS: I downloaded the files from ncbi in .gz format, so I uncompressed it using gzunzip command.

Chujie

vetmohit89 commented 5 months ago

Hello @ssscj,

Thanks for trying our tool. These are merged Fast5 files. Here is my guppy command for basecalling:

guppy_basecaller \ -i ./FAST5/ \ -s output_fastq/ \ --fast5_out \ -c rna_r9.4.1_70bps_m6A_hac.cfg --reverse_sequence \ --compress_fastq \ -x "cuda:all" --num_callers 5 --gpu_runners_per_device 8 \ --chunks_per_runner 100 --chunk_size 100

ssscj commented 5 months ago

Hi @vetmohit89 , thanks for your reply. How did you merged the fast5 and which version of guppy did you use? Mine did not work under your parameters. Thank you.

ssscj commented 5 months ago

Did you use the merged fast5 file as the input of your command?

ssscj commented 5 months ago

Fast5 file is in hdf5 format, so I used h5dump to convert it into a text file, I found only 4000 reads. Maybe other reads were not detectable due to the merging? Would you upload the raw fast5 files again. Many thanks.

vetmohit89 commented 5 months ago

I am not sure what went wrong. Please download fastq files from this link. I will upload these files in NCBI SRA also soon.

Let me know if it does not work for you.

https://uab.box.com/s/iro33o1d6auqhbwgzs6h9ruyy2vwbffy

ssscj commented 5 months ago

I downloaded the fastq files successfully, thanks a lot for sharing the data. And I also need the fast5 files for other pseudo-uridine identification methods, thank you!

ssscj commented 2 months ago

Hi, is there any update about the data? Thanks

vetmohit89 commented 2 months ago

Hello,

All file are available: SRR29662301, SRR29662300, SRR29662302, SRR29662303. Let me know if you have any issue now.