novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
110 stars 31 forks source link

Problem with the execution of run.sh in make_prediction directory #65

Closed lionelus closed 3 years ago

lionelus commented 4 years ago

Hello,

I'm trying to run the file run.sh located in the make_prediction directory.

I get the following error:

Error in [.data.frame(input, , c("X.Ref", "pos", "position", "base", : undefined columns selected

Can you help me to understand what doesn't work ?

Thanks in advance.

Huanle commented 4 years ago

Hi @lionelus, run.sh consists of quite a few command lines. It's hard to tell which one generated this error message unless you copy the full log information or try them one by one and find the one gave this error. Basically, i need more details to locate it. Can you help?

lionelus commented 4 years ago

Hi Huanle,

Here you are the whole error message:

CMDs:
guppy_basecaller -c rna_r9.4.1_70bps_hac.cfg --compress_fastq -i ko_raw_fast5/ -r -s ko_fast5 --fast5_out --cpu_threads_per_caller 12
guppy_basecaller -c rna_r9.4.1_70bps_hac.cfg --compress_fastq -i wt_raw_fast5/ -r -s wt_fast5 --fast5_out --cpu_threads_per_caller 12
reads mapping either with minimap2 or graphmap2
minimap2: minimap2 --MD -t 6 -ax map-ont ref.fa ko.fastq | samtools view -hbS -F 3844 - | samtools sort -@ 6 - ko
graphmap2:  graphmap align -r ref.fa -d ko.fastq -o ko.sam -v 1 -K fastq
graphmap2 with higher sensitivity: graphmap align -r ref.fa -d ko.fastq -o ko.sam -v 1 -K fastq  --rebuild-index --double-index --mapq -1 -x sensitive -z -1 --min-read-len 0 -A 7 -k 5
reads can also be mapped to reference genome with minimap2
compute varitants/error frequencies from bam file
predict based on deviance of mis
AND
predict based on linear regression model residuals, using mis feature
CMD: Rscript ../Epinano_DiffErr.R -k ko.plus_strand.per.site.var.csv  -w wt.plus_strand.per.site.var.csv -t 5 -o HL -c 30 -f sum  -d 0.1 -p
Error in `[.data.frame`(input, , c("X.Ref", "pos", "position", "base",  :
  undefined columns selected
Calls: cleanup -> [ -> [.data.frame
Execution halted
similarly we can use the same method but with sum_err
generate sum_err (mis, ins, del)
Warning message:
Removed 11 rows containing missing values (geom_point).
Switching on Epinano_sumErr.py -q will include quality score to compute sum_err
predict using pretrained SVM models
using q3,mis3,del3 features
Commad:  ../../Epinano_Predict.py -o SVM_Predict -M ../models/rrach.q3.mis3.del3.linear.dump -p wt.plus_strand.per_site.5mer.csv -cl 8,13,23
Traceback (most recent call last):
  File "../../Epinano_Predict.py", line 97, in <module>
    predict_df = pd.read_csv (args['predict'],compression='gzip') if args['predict'].endswith ('.gz') else pd.read_csv (args['predict'])
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 384, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'wt.plus_strand.per_site.5mer.csv' does not exist
generate delta-features

        python Epinano_make_delta.py <modified sample feature table> <unmodified sample feature table> <minimum coverage at sites>  <windown size of feature table>
        if not specified, slided window size is 0
        otherwise, windown size = kmer size

predict using pretrained SVM models with delta features
Commad:  ../../Epinano_Predict.py -o SVM_Predict_delta_features -M ../models/rrach.deltaQ3.deltaMis3.deltaDel3.linear.dump -p wt_ko_delta.5mer.csv -cl 7,12,22
Traceback (most recent call last):
  File "../../Epinano_Predict.py", line 97, in <module>
    predict_df = pd.read_csv (args['predict'],compression='gzip') if args['predict'].endswith ('.gz') else pd.read_csv (args['predict'])
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/python/versions/3.6.3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
plot SVM-based prediction p-values
Error in file(file, "rt") : cannot open the connection
Calls: read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'SVM_Predict_delta_features.mis3.del3.q3.MODEL.rrach.deltaQ3.deltaMis3.deltaDel3.linear.dump.csv': No such file or directory
Execution halted
../misc/Epinano_Plot.R can also generate plots from ../misc/Epinano_DiffErr.R outputs
Oh Wait! there is the current intensity values that you can incoporate into the above analyses
Extract and collapse current intensity values
../../Epinano_Current.sh: line 52: nanopolish: command not found
../../Epinano_Current.sh: line 53: nanopolish: command not found
start analysis 2020-10-26 10:29:27.283743
ko.eventalign.tsv.gz.forward_events.collapsed exists, will over-write it
adding small chunk file and referenc file 2020-10-26 10:29:27.323547
splittting nanopolish eventalign results failed
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/python/versions/3.6.3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/python/versions/3.6.3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "../../misc/Epinano_Current.py", line 57, in _split_eventalign_tbl_on_read
    header = "\t".join ([ary[0], ary[1],ary[2],ary[4],ary[8],ary[9],ary[15]])
IndexError: list index out of range
finish adding reference and small files 2020-10-26 10:29:27.371749
finish warapping up all small files  2020-10-26 10:29:27.430564
if you want to CAT all results together, run concat_events.py to concatenate the results
combine []
Traceback (most recent call last):
  File "../../misc/concat_events.py", line 35, in <module>
    main()
  File "../../misc/concat_events.py", line 33, in main
    final_output (files, outfh)
  File "../../misc/concat_events.py", line 16, in final_output
    header = openfile (sumfiles[0]).readline().rstrip()
IndexError: list index out of range
Traceback (most recent call last):
  File "../../misc/Slide_Intensity.py", line 204, in <module>
    main()
  File "../../misc/Slide_Intensity.py", line 202, in main
    slide_intensity (inp, win)
  File "../../misc/Slide_Intensity.py", line 75, in slide_intensity
    eof = fh.seek (fh.tell()-1, os.SEEK_SET)
ValueError: negative seek position -1
../../Epinano_Current.sh: line 52: nanopolish: command not found
../../Epinano_Current.sh: line 53: nanopolish: command not found
start analysis 2020-10-26 10:29:29.732251
wt.eventalign.tsv.gz.forward_events.collapsed exists, will over-write it
adding small chunk file and referenc file 2020-10-26 10:29:29.777141
splittting nanopolish eventalign results failed
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/python/versions/3.6.3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/python/versions/3.6.3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "../../misc/Epinano_Current.py", line 57, in _split_eventalign_tbl_on_read
    header = "\t".join ([ary[0], ary[1],ary[2],ary[4],ary[8],ary[9],ary[15]])
IndexError: list index out of range
finish adding reference and small files 2020-10-26 10:29:29.870389
finish warapping up all small files  2020-10-26 10:29:29.996677
if you want to CAT all results together, run concat_events.py to concatenate the results
combine []
Traceback (most recent call last):
  File "../../misc/concat_events.py", line 35, in <module>
    main()
  File "../../misc/concat_events.py", line 33, in main
    final_output (files, outfh)
  File "../../misc/concat_events.py", line 16, in final_output
    header = openfile (sumfiles[0]).readline().rstrip()
IndexError: list index out of range
Traceback (most recent call last):
  File "../../misc/Slide_Intensity.py", line 204, in <module>
    main()
  File "../../misc/Slide_Intensity.py", line 202, in main
    slide_intensity (inp, win)
  File "../../misc/Slide_Intensity.py", line 75, in slide_intensity
    eof = fh.seek (fh.tell()-1, os.SEEK_SET)
ValueError: negative seek position -1
create Intensity feature table contains Intensity and duration values
Traceback (most recent call last):
  File "../../misc/Slide_Intensity.py", line 204, in <module>
    main()
  File "../../misc/Slide_Intensity.py", line 202, in main
    slide_intensity (inp, win)
  File "../../misc/Slide_Intensity.py", line 75, in slide_intensity
    eof = fh.seek (fh.tell()-1, os.SEEK_SET)
ValueError: negative seek position -1
Traceback (most recent call last):
  File "../../misc/Slide_Intensity.py", line 204, in <module>
    main()
  File "../../misc/Slide_Intensity.py", line 202, in main
    slide_intensity (inp, win)
  File "../../misc/Slide_Intensity.py", line 75, in slide_intensity
    eof = fh.seek (fh.tell()-1, os.SEEK_SET)
ValueError: negative seek position -1
join current intensity features with error/variants features so that you can train model with interesting combinations of feature!
Traceback (most recent call last):
  File "../../misc/Join_variants_currents.py", line 37, in <module>
    with openfile (args.variants) as variant:
  File "../../misc/../epinano_modules.py", line 22, in openfile
    fh = open(f,'rt')
FileNotFoundError: [Errno 2] No such file or directory: 'wt.plus_strand.per_site.5mer.csv'
Traceback (most recent call last):
  File "../../misc/Join_variants_currents.py", line 37, in <module>
    with openfile (args.variants) as variant:
  File "../../misc/../epinano_modules.py", line 22, in openfile
    fh = open(f,'rt')
FileNotFoundError: [Errno 2] No such file or directory: 'ko.plus_strand.per_site.5mer.csv'
now it is possible to use both current intensity and Error features to do training or apply certain statistical tests to detect modifications!!
Huanle commented 4 years ago

Hi @lionelus ,

Thanks for providing these details. Can you send me the ko.plus_strand.per.site.var.csv and wt.plus_strand.per.site.var.csv files? These are produced by Epinano_Variants.py.

lionelus commented 4 years ago

Hi @Huanle ,

Here you are the files:

wt.plus_strand.per.site.var.csv.gz ko.plus_strand.per.site.var.csv.gz

Huanle commented 3 years ago

Hi @lionelus, Sorry to get back to you after a while. But i was involved in other projects. I can see from your command Rscript ../Epinano_DiffErr.R -k ko.plus_strand.per.site.var.csv -w wt.plus_strand.per.site.var.csv -t 5 -o HL -c 30 -f sum -d 0.1 that there is a mistake. -f sum requires summed features. But this is not in the input files. This is my bad though.

If you run Epinano_DiffErr.R -k ko.plus_strand.per.site.var.csv -w wt.plus_strand.per.site.var.csv -t 5 -o test -c 30 -f mis -d 0.1 or use -f del | -f ins, it should work.