Closed pengelgau closed 6 years ago
Hi,
What version of guppy are you using? I just looked at a sequencing summary file for one of our recent runs and the format is very similar to albacore's.
Jared
Hi Jared,
I used Guppy 1.4.3. I tried using the Guppy one anyways and got that no filename column header error:
Could not find filename column in the header of ../../raw_reads/reads/md5.txt
I then tried putting in a header with read_id and filename (in that order because of the nature of the file), and I still get the same error. This file seems to be space delimited. Is the albacore one tab delimited?
Phil
It looks like you are using this file: ../../raw_reads/reads/albacore_md5.txt
as a sequencing summary. I don't think that is the correct file to use. Do you not have files with the name sequencing_summary_nnnn.txt
?
I can't find a file with that name. I didn't actually perform the sequencing or basecalling myself, they were performed by my school's genomics core. I'll ask them about that file, perhaps they neglected to send that to me. In the meantime, I found that md5.txt file with my raw reads after unzipping, does that not have enough information to reformat into something that nanopolish would prefer?
Actually I do have those. I should have looked just a little bit harder before responding...
They are in this format:
filename read_id run_id channel start_time duration num_events template_start num_events_template template_duration sequence_length_template mean_qscore_template strand_score_template GXB01136_20180817_FAH87054_GA40000_sequencing_run_A_45079_read_6449_ch_327_strand.fast5 9f769fcd-5b23-4479-890c-23b68fbfaa9b f944a0a3b76c9e80f9301ab9f8eb4ed4c31b7971 327 8401.147461 16.5905 13272 8401.285156 13162 16.452999 5858 10.747499 -0.000313 GXB01136_20180817_FAH87054_GA40000_sequencing_run_A_45079_read_4439_ch_200_strand.fast5 7f6e317a-1497-4e37-9019-84d9b012a50a f944a0a3b76c9e80f9301ab9f8eb4ed4c31b7971 200 8412.035156 5.829 4663 8412.21875 4516 5.64525 1821 11.852244 -9.4e-05
I will give these a try and get back to you.
Yes, those are the files you need.
The files worked just fine. I guess I would suggest that in the help read out for index that you also mention Guppy instead of just Albacore. Nonetheless thanks for the quick help, I greatly appreciate it.
Glad to hear it! I'll make a note about mentioning guppy works too.
I am having the same issue, namely
Could not find filename column in the header of /nexusb/Gridion/20190905MicroRap/Microbio/20190905_1400_GA10000_FAK80986_effcd777/sequencing_summary/GXB01439_20190905_160028_FAK80986_gridion_sequencing_run_Microbio_sequencing_summary.txt
I have sequenced a pool of samples with Gridion (basecaller should be Guppy, don't know exactly which version). I have demultiplexed the samples with qcat
and now I want to create the index to link the (demultiplexed) fastq with the fast5 files. After calling the command:
nanopolish index -v -s /nexusb/Gridion/20190905MicroRap/Microbio/20190905_1400_GA10000_FAK80986_effcd777/sequencing_summary/GXB01439_20190905_160028_FAK80986_gridion_sequencing_run_Microbio_sequencing_summary.txt -d /nexusb/Gridion/20190905MicroRap/Microbio/20190905_1400_GA10000_FAK80986_effcd777/fast5_pass/ /nexusb/Gridion/20190905MicroRap/Microbio/20190905_1400_GA10000_FAK80986_effcd777/fastq_pass/demux/BORD1725_barcode03.fastq
I get the error message shown above. In fact there is no filename column in my sequencing_summary
file. Below I am displaying its header with one entry:
filename_fastq filename_fast5 read_id run_id channel mux start_time duration num_events passes_filtering template_start num_events_template template_duration sequence_le
ngth_template mean_qscore_template strand_score_template median_template mad_template pore_type experiment_id sample_id
FAK80986_d83ffac69ab548d4fc4f9876b6d2f931ed3827e2_0.fastq FAK80986_d83ffac69ab548d4fc4f9876b6d2f931ed3827e2_0.fast5 d14889bc-7e81-47ea-8c12-8aa8055fd2f1 d83ffac69ab548d4fc4f9876b6d2f931ed3827e2 4
08 1 8.512250 0.674250 0 TRUE 8.531750 0 0.654750 232 12.107406 0.000000 87.202446 9.349191 not_set 20190905MicroRap Mic
robio
I noticed that the structure of the file above is somewhat different that that of the sequencing_summary.txt
file generated by albacore.
I have installed nanopolish with conda, version 0.11.2 (nanopolish 0.11.2 h705302d_0 bioconda
)
Is there any fix for this (other than indexing without the -s
option, which appears to be very slow)?
Hi @BCArg ,
I ran into the same problem with the output from epi2me. Essentially the file contains the right info, just in a slightly different format.
Here's a quick (and dirty) R script to reformat the summary file: https://www.dropbox.com/s/p9nia675pek1roj/reformat_summary.R?dl=0
Us on command line as: Rscript reformat_summary.R summaryfile reformattedsummaryfile
Best,
Craig
I'm currently trying to index my reads but I'm finding that it is taking quite a while. (If my estimated time holds true it looks like 7 full days of computer time to complete everything). I think I could speed this up if I converted the sequencing summary file from Guppy into the same format as Albacore. The only problem is that I don't know what the Albacore files looks like, or what format they are in. I've tried looking around but I can't find anything. Below is an example of my Guppy sequencing summary file. It's a txt file with a read id and filename on each line.
76b4dc100267658aa54d86afab31d5da ./13/GXB01136_20180808_FAH87162_GA40000_sequencing_run_A_15756_read_2882_ch_371_strand.fast5 d2b07acc4f7749d54383da68bd0e7a76 ./13/GXB01136_20180808_FAH87162_GA40000_sequencing_run_A_15756_read_2254_ch_253_strand.fast5 2daefd684239f022675c08fe5e272a85 ./13/GXB01136_20180808_FAH87162_GA40000_sequencing_run_A_15756_read_3523_ch_109_strand.fast5 4ed9dddae72e40333d51a0a196c1a05c ./13/GXB01136_20180808_FAH87162_GA40000_sequencing_run_A_15756_read_3515_ch_163_strand.fast5 f3109f57eb28a29e6a736ea3270ade34 ./13/GXB01136_20180808_FAH87162_GA40000_sequencing_run_A_15756_read_3393_ch_502_strand.fast5 431751ffc02c0e0462606ff3fdb1e5ae ./13/GXB01136_20180808_FAH87162_GA40000_sequencing_run_A_15756_read_2191_ch_18_strand.fast5 72e3815d6e493d19e921e536a058af6d ./13/GXB01136_20180808_FAH87162_GA40000_sequencing_run_A_15756_read_3078_ch_506_strand.fast5 baf372d8c39e3cbb79fe8238f0820543 ./13/GXB01136_20180808_FAH87162_GA40000_sequencing_run_A_15756_read_3157_ch_472_strand.fast5
I think I can just use sed to convert it but without an example I can't really try. Any help is greatly appreciated. Thanks for reading.