jts / methylation-analysis

43 stars 21 forks source link

Error methyltrain #15

Open quanc1989 opened 5 years ago

quanc1989 commented 5 years ago

I downloaded data from https://www.ebi.ac.uk/ena/data/view/PRJEB13021 and want to compile the pipeline. Since the whole dataset is too large, I extract 20-30 files from ecoli R7 data for training (ecoli_er2925.MSssI.timp.100215.fast5, ecoli_er2925.native.timp.110915.fast5, ecoli_er2925.pcr_MSssI.timp.021216.fast5, ecoli_er2925.pcr.timp.021216.fast5).

Then I compiled the pipeline and the following error occurs. I found that raw R7 fast5 files have no Signal object, I wonder whether this pipeline could be accomplished by these data without Raw Signal.

poretools fasta --type 2D /Users/quanc/Documents/Data/Nanopore/data/ecoli_er2925.MSssI.timp.100215.fast5/pass > ecoli_er2925.MSssI.timp.100215.pass.fasta

poretools fasta --type 2D /Users/quanc/Documents/Data/Nanopore/data/ecoli_er2925.MSssI.timp.100215.fast5/fail > ecoli_er2925.MSssI.timp.100215.fail.fasta

cat ecoli_er2925.MSssI.timp.100215.pass.fasta ecoli_er2925.MSssI.timp.100215.fail.fasta > ecoli_er2925.MSssI.timp.100215.fasta

nanopolish index -d ~/Documents/Data/Nanopore/data/ ecoli_er2925.MSssI.timp.100215.fasta
[readdb] num reads: 17, num reads with path to fast5: 17

bwa mem -t 4 -x ont2d ecoli_k12.fasta ecoli_er2925.MSssI.timp.100215.fasta |\
        samtools view -q 20 -Sb - |\
        samtools sort -o ecoli_er2925.MSssI.timp.100215.sorted.bam -T %.tmp
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 17 sequences (125274 bp)...
[M::mem_process_seqs] Processed 17 reads in 1.828 CPU sec, 0.558 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 4 -x ont2d ecoli_k12.fasta ecoli_er2925.MSssI.timp.100215.fasta
[main] Real time: 0.582 sec; CPU: 1.845 sec

samtools index ecoli_er2925.MSssI.timp.100215.sorted.bam

/Users/quanc/Documents/Workspace/Github/methylation-analysis/initialize_model.sh /Users/quanc/Documents/Workspace/Github/methylation-analysis/models/r7.3_e6_70bps_6mer_template_median68pA.model template t.006 SQK006 > t.006.ont.model
ln -s t.006.ont.model t.006.ont.alphabet_nucleotide.model

/Users/quanc/Documents/Workspace/Github/methylation-analysis/initialize_model.sh /Users/quanc/Documents/Workspace/Github/methylation-analysis/models/r7.3_e6_70bps_6mer_complement_median68pA_pop1.model complement.pop1 c.p1.006 SQK006 > c.p1.006.ont.model
ln -s c.p1.006.ont.model c.p1.006.ont.alphabet_nucleotide.model

/Users/quanc/Documents/Workspace/Github/methylation-analysis/initialize_model.sh /Users/quanc/Documents/Workspace/Github/methylation-analysis/models/r7.3_e6_70bps_6mer_complement_median68pA_pop2.model complement.pop2 c.p2.006 SQK006 > c.p2.006.ont.model
ln -s c.p2.006.ont.model c.p2.006.ont.alphabet_nucleotide.model

echo t.006.ont.alphabet_nucleotide.model c.p1.006.ont.alphabet_nucleotide.model c.p2.006.ont.alphabet_nucleotide.model | tr " " "\n" > ont.alphabet_nucleotide.R7.fofn

nanopolish methyltrain -t 4  --train-kmers all --out-fofn ecoli_er2925.MSssI.timp.100215.alphabet_nucleotide.fofn --out-suffix .ecoli_er2925.MSssI.timp.100215.alphabet_nucleotide.model -m ont.alphabet_nucleotide.R7.fofn -b ecoli_er2925.MSssI.timp.100215.sorted.bam -r ecoli_er2925.MSssI.timp.100215.fasta -g ecoli_k12.fasta.alphabet_nucleotide --filter-policy R7

Training SQK006 for alphabet nucleotide for 6-mers
Starting round 0
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 123145559535616:
  #000: H5L.c line 1117 in H5Lget_name_by_idx(): name doesn't exist
    major: Symbol table
    minor: Object already exists
  #001: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #002: H5Gtraverse.c line 755 in H5G_traverse_real(): component not found
    major: Symbol table
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 123145560072192:
  #000: H5L.c line 1117 in H5Lget_name_by_idx(): name doesn't exist
    major: Symbol table
    minor: Object already exists
  #001: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #002: H5Gtraverse.c line 755 in H5G_traverse_real(): component not found
    major: Symbol table
    minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 123145560072192:
  #000: H5D.c line 358 in H5Dopen2(): not found
    major: Dataset
    minor: Object not found
  #001: H5Gloc.c line 430 in H5G_loc_find(): can't find object
    major: Symbol table
    minor: Object not found
  #002: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #003: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #004: H5Gloc.c line 385 in H5G_loc_find_cb(): object 'Signal' doesn't exist
    major: Symbol table
    minor: Object not found
Assertion failed: (rt.n > 0), function load_from_raw, file src/nanopolish_squiggle_read.cpp, line 321.
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 123145559535616:
  #000: H5D.c line 358 in H5Dopen2(): not found
    major: Dataset
    minor: Object not found
  #001: H5Gloc.c line 430 in H5G_loc_find(): can't find object
    major: Symbol table
    minor: Object not found
  #002: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
make: *** [ecoli_er2925.MSssI.timp.100215.alphabet_nucleotide.fofn] Abort trap: 6
jts commented 5 years ago

Hi,

The fast5 file structure has changed a lot since 2015 and R7 data is no longer well supported. If you want to exactly replicate the analysis for our paper you'll have to use the specific version of nanopolish that we have in pipeline.make.

Jared

quanc1989 commented 5 years ago

@jts Thanks a lot! This error confused me a few days. Based on your suggestion, counld I draw a conclusion that current version of nanopolish have to work with fast5 Files which contain /Raw/Signal ? And is it true for both methyltrain, methyltest and call-methylation?

jts commented 5 years ago

Yes, all modern ONT data will contain /Raw/Signal. We've tried to maintain support for older data in nanopolish but since no one really uses R7 data anymore some features may be neglected.

quanc1989 commented 5 years ago

Got it. Then I have to find some R9 data to train the model. Thanks again.