weir12 / DENA

Deep learning model used to detect RNA m6a with read level based on the Nanopore direct RNA data.
MIT License
22 stars 5 forks source link

No modification value in the output data #19

Closed kwonej0617 closed 1 year ago

kwonej0617 commented 1 year ago

Hi, Thank you for developing a useful tool. I have run DENA with my dataset. The output files were successfully generated, but there are no modification values. I am not sure what the problem is. Could you please give me an advice to fix the problem?

wt.tsv

ENST00000377350 2320    AAACA   0       1       0.0
ENST00000380668 473     AAACA   0       8       0.0
ENST00000380668 818     AAACA   0       8       0.0
ENST00000380668 846     AAACA   0       8       0.0
ENST00000398491 482     AAACA   0       1       0.0
ENST00000398491 827     AAACA   0       1       0.0
ENST00000398491 855     AAACA   0       1       0.0
ENST00000401827 401     AAACA   0       2       0.0
ENST00000267163 154     AAACA   0       6       0.0
ENST00000267163 368     AAACA   0       6       0.0
ENST00000267163 721     AAACA   0       7       0.0

wt_details.tsv

ENST00000277900 904     AAACA
dce44906-2feb-47e7-a795-a05b923055a2    
ENST00000277900 1309    AAACA
dce44906-2feb-47e7-a795-a05b923055a2    
ENST00000277900 1484    AAACA
dce44906-2feb-47e7-a795-a05b923055a2    
ENST00000277900 1552    AAACA
dce44906-2feb-47e7-a795-a05b923055a2    
ENST00000277900 1716    AAACA
dce44906-2feb-47e7-a795-a05b923055a2    
ENST00000277900 1874    AAACA
dce44906-2feb-47e7-a795-a05b923055a2    
ENST00000424679 908     AAACA
8c186437-137f-435b-a26f-24c35e5ce9f9    
ENST00000424679 974     AAACA
8c186437-137f-435b-a26f-24c35e5ce9f9    
ENST00000424679 1068    AAACA
8c186437-137f-435b-a26f-24c35e5ce9f9

The following is my command line. I just wanted to note that I didn't add -corr_grp ${RawGenomeCorrected_000} in the extraction features step. Because my Tombo re-squiggling process with --corrected-group RawGenomeCorrected_001 --basecall-group Basecall_1D_001 --include-event-stdev --overwrite --ignore-read-locks failed, I run re-squiggle without those options and run DENA as follows. I am wondering if that could lead to the problem.

#Extract features
python3 /home/euijin.kwon-umw/Euijin/DENA/step4_predict/LSTM_extract.py predict --fast5 ${wt_fast5} --bam ${wt_bam} --processes 16 --sites ${candidate_predict_pos} --label wt --windows 2 2

#Predict
python /home/euijin.kwon-umw/Euijin/DENA/step4_predict/LSTM_predict.py -i . -m ${DENA} -o output -p wt -d

I am looking forward to hearing from you. Thank you!

weir12 commented 1 year ago

Hi,

As you pointed out, in order for DENA to accurately retrieve the required data, it needs to be provided with the correct position of the corrected_group within the fast5 files. You are seeking confirmation regarding the proper functioning of thetombo re-squiggle command.

In summary, the analysis software Tombo writes pertinent information into the data slot, such as RawGenomeCorrected_001, within each fast5 file. DENA then extracts the necessary data from this specific data slot.

You can refer to the following link for more details: GitHub link

Cited description of the --corrected-group option from the Tombo documentation

The --corrected-group slot contains attributes for the signal normalization (shift, scale, upper_limit, lower_limit, outlier_threshold and the tombo signal matching score) as well as a boolean flag indicating whether the read is DNA or RNA. Within the Alignment group, the gemomic mapped start, end, strand and chromosome as well as mapping statistics (number clipped start and end bases, matching, mismatching, inserted and deleted bases) are stored.

I hope this clarification helps!

Ou Liang

kwonej0617 commented 1 year ago

It worked! Thank you so much for your answer!

ENST00000361390 415     AAACA
939fae43-6763-4123-ad11-b3b39bd57787    0.11169436
d3478b81-54a7-4706-9d06-b836a62b9931    0.07080017
2702e88e-2565-4f1b-a1d8-28c04f1395af    0.009372625
514f02ba-bf35-4363-a9b4-a0b916c1c85b    0.043446995
939ab641-41c1-4456-bee5-dae857905c06    0.033762895
2cb9149d-cd2b-466e-82fb-b578ad03e189    0.054366756
8c45ba62-7a95-418d-b315-bfaec977c140    0.70852625

ENST00000361390 415     AAACA   167     2535    0.06587771203155819
ENST00000361390 689     AAACA   261     2661    0.09808342728297632
ENST00000361390 704     AAACA   220     2665    0.0825515947467167
ENST00000361453 107     AAACA   331     2139    0.15474520804114072
ENST00000361453 704     AAACA   341     2592    0.13155864197530864
ENST00000361453 818     AAACA   164     2629    0.062381133510840625
ENST00000361624 293     AAACA   213     1446    0.14730290456431536
ENST00000361624 1352    AAACA   179     1444    0.12396121883656509
ENST00000361739 514     AAACA   121     6351    0.019052117776728075