weir12 / DENA

Deep learning model used to detect RNA m6a with read level based on the Nanopore direct RNA data.
MIT License
22 stars 5 forks source link

empty tmp file of LSTM_extract.py #20

Closed ssscj closed 1 year ago

ssscj commented 1 year ago

Hi, thanks for developing DENA. I run the LSTM_extract.py predict step to extract features of my data, using 72 cups, and I got 72 tmp files, but most of them were empty files, and I got only features of 24 reads.

my command was: python LSTM_extract.py predict \ --processes 20 --fast5 fast5/ \ --corr_grp RawGenomeCorrected_001 --bam sort.bam \ --sites ../filt_candidate_predict_pos.txt --label test --windows 2 2

the log file said: [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). [20:45:17] Parsing Tombo index file(s). processes_30: 100%|██████████| 3358/3358 [7:49:22<00:00, 8.39s/it] processes_0: 100%|██████████| 3358/3358 [7:42:10<00:00, 1.49it/s] processes_1: 100%|██████████| 3358/3358 [4:26:06<00:00, 4.56s/it] processes_2: 100%|██████████| 3358/3358 [4:34:13<00:00, 1.99it/s] (more hidden)

Do you have any idea? Thanks for your help.

ssscj commented 1 year ago

I run the command with --debug and the log said cannot mask with array containing NA / NaN values cannot mask with array containing NA / NaN values cannot mask with array containing NA / NaN values cannot mask with array containing NA / NaN values 'ec14436a-3720-401a-a3b8-bfaeb0cbd390' cannot mask with array containing NA / NaN values03, 19.93it/s] ENST00000600625.5 515-520found 0 reads in bamfile ENST00000600625.5 542-547found 8 reads in fast5 '4bf4b869-ddfb-49d3-95ec-ee2d2b4994eb' '0cdf18f8-6ad1-4d79-828d-f4dd9b81b10c' cannot mask with array containing NA / NaN values cannot mask with array containing NA / NaN values cannot mask with array containing NA / NaN values cannot mask with array containing NA / NaN values

weir12 commented 1 year ago

Hi ,

Based on the limited information provided, it appears that the target site you're referring to may have extremely low read counts. To address this, I suggest the following steps:

  1. Check the coverage depth of the BAM file, specifically focusing on potential m6A sites of interest.

  2. Evaluate the output of the tombo-resquiggle command, paying attention to the success rate of aligning the Fast5 files.

These steps will help you assess the coverage and alignment quality of your data, which could be contributing to the low read count issue.

If you have any further questions or need clarification, please feel free to ask.

weir12 commented 1 year ago

Issue not updated for a long time

ssscj commented 1 year ago

Hi, thanks for your reply. I filtered the m6A sites and still got the error. I found that BRI package caused the error. I removed BRI, and got the correct output.