tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
78 stars 12 forks source link

NanocomporeError: Required fields not found in the data file #153

Closed pabloacera closed 3 years ago

pabloacera commented 3 years ago

Hi, I ran nanocompore like

/home/249/pm1122/.local/bin/nanocompore sampcomp \
--file_list1 /g/data/xc17/modified_RNAs/kat1_analysis/nanocompore/nanopolishcomp_kat1.tsv/out_eventalign_collapse.tsv,/g/data/xc17/modified_RNAs/kat4_analysis/nanocompore/nanopolishcomp_kat4.tsv/out_eventa$
--file_list2 /g/data/xc17/modified_RNAs/kat3_analysis/nanocompore/nanopolishcomp_kat3.tsv/out_eventalign_collapse.tsv \
--label1 kat1_kat4 \
--label2 kat3 \
--fasta /g/data/xc17/modified_RNAs/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa \
--outpath /g/data/xc17/modified_RNAs/results/kat1kat4_vs_kat3 \
--nthreads 5

I got this error


Initialising SampComp and checking options
Only 1 replicate found for condition kat3
This is not recommended. The statistics will be calculated with the logit method
Initialising Whitelist and checking options
Reading eventalign index files
        References found in index: 786
Filtering out references with low coverage
        References remaining after reference coverage filtering: 2
Starting data processing
^M  0%|          | 0/2 [00:00<?, ? Processed References/s]^M  0%|          | 0/2 [00:00<?, ? Processed References/s]
Traceback (most recent call last):
  File "/home/249/pm1122/.local/bin/nanocompore", line 11, in <module>
    sys.exit(main())
  File "/home/249/pm1122/.local/lib/python3.6/site-packages/nanocompore/__main__.py", line 139, in main
    args.func(args)
  File "/home/249/pm1122/.local/lib/python3.6/site-packages/nanocompore/__main__.py", line 174, in sampcomp_main
    db = s()
  File "/home/249/pm1122/.local/lib/python3.6/site-packages/nanocompore/SampComp.py", line 251, in __call__
    raise E
  File "/home/249/pm1122/.local/lib/python3.6/site-packages/nanocompore/SampComp.py", line 246, in __call__
    raise NanocomporeError(tb)
nanocompore.common.NanocomporeError: Traceback (most recent call last):
  File "/home/249/pm1122/.local/lib/python3.6/site-packages/nanocompore/SampComp.py", line 327, in __process_references
    raise NanocomporeError("Required fields not found in the data file: {}".format(col_names))
nanocompore.common.NanocomporeError: Required fields not found in the data file: ['ref_pos', 'ref_kmer', 'num_events', 'dwell_time', 'NNNNN_dwell_time', 'mismatch_dwell_time', 'start_idx', 'end_idx']

I ran nanopolish like this:

ata/xc17/pm1122/lib/nanopolish/nanopolish eventalign -t 20 --reads /g/data/xc17/modified_RNAs/kat3_analysis/kat3.fastq.gz \
 --bam /g/data/xc17/modified_RNAs/kat3_analysis/mapping/kat3_filtered.bam  --genome /g/data/xc17/modified_RNAs/Saccharomyces_cerevisiae.R64-1-1.cdna.all.fa \
 --scale-events --signal-index --summary=/g/data/xc17/modified_RNAs/kat3_analysis/nanopolish/summary_kat3.txt > /g/data/xc17/modified_RNAs/kat3_analysis/nanopolish/eventalign_kat3.txt

I didn't include the flag --samples when runnig nanopolish. Do you think this is the issue? Thanks in advance for your time. Pablo.

tleonardi commented 3 years ago

Hi Pablo, yes, I think that's the issue. Could you try?

a-slide commented 3 years ago

Yes you are definitely supposed to use the --samples option in Nanopolish as well as other specific options as mentioned in the documentation on how to prepare your data (https://nanocompore.rna.rocks/data_preparation/):

nanopolish index -s {sequencing_summary.txt} -d {raw_fast5_dir} {basecalled_fastq}

nanopolish eventalign --reads {basecalled_fastq} --bam {aligned_reads_bam} --genome {transcriptome_fasta} --print-read-names --scale-events --samples > {eventalign_reads_tsv}

NanopolishComp Eventalign_collapse -i {eventalign_reads_tsv} -o {eventalign_collapsed_reads_tsv}