novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
109 stars 31 forks source link

Error running Epinano_DiffErr.R on 5mer output from Epinano_sumErr.py #125

Open GeoffLyle opened 2 years ago

GeoffLyle commented 2 years ago

I have been running into an issue trying to run Epinano_DiffErr.R on the output from Epinano_sumErr.py.

Running Epnano_sumErr.py appears to work: Epinano_sumErr.py --quality --file NHA_hTERT_DRNA_20220609_self_transcript_aligned.sorted.plus_strand.per.site.5mer.csv --out NHA-hTERT_5mer.sum_err.csv --kmer 5

python3 Epinano_sumErr.py --quality --file full_fq_to_sample_transcripts_output.sorted.plus_strand.per.site.5mer.csv --out DIPG-IV_5mer.sum_err.csv --kmer 5

However, when I try to run this output through Epinano_DiffErr.R I run into the following error: Rscript Epinano_DiffErr.R -k NHA-hTERT_5mer.sum_err.csv -w DIPG-IV_5mer.sum_err.csv -c 30 -d 0.1 -t 3 -o DIPG-IV_NHA-hTERT_5mer_sumErr --feature sum_err3

Error:

Error in merge.data.frame(dat1, dat2, by = "chr_pos") :
negative length vectors are not allowed

This appears to be due to a memory limit issue.

Note: I also tried changing line 126 in Epinano_DiffErr.R: combine <- merge(dat1, dat2, by="chr_pos") to: combine <- dplyr::full_join(dat1, dat2, by="chr_pos")

I thought that the dplyr package could fix the memory limit issue, but I'm getting this error now:

Error: cannot allocate vector of size 127613.3 Gb
Execution halted

This is the size of the dataframes I want to merge: [1] "Number of rows in dat1: 3571272" 54.5 Mb [1] "Number of rows in dat2: 9592079" 146.4 Mb

Have you run into this error when running Epinano_DiffErr.R, and if so what was your solution?

PS: It would also be great to be able to pass the 5 sum_err columns at the same time as was suggested in #122

enovoa commented 1 year ago

Hi @GeoffLyle sorry for the slow reply. Were you able to solve this issue? Also, thanks for your suggestion on using 5sum_err columns, we will keep this in mind for future updates.