Closed igwill closed 5 years ago
The -f file should not contain a header, I think if you remove the first line it should work.
jared
Ah, sorry should have been more clear, if I remove "filename" the error reports the first line of the file. e.g.: "Could not find filename column in the header of mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/1/sequencing_summary.txt"
What does head mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/1/sequencing_summary.txt
look like?
filename read_id run_id channel start_time duration num_events passes_filtering template_start num_events_template template_duration num_called_template sequence_length_template mean_qscore_template strand_score_te
mplate calibration_strand_genome_template calibration_strand_identity_template calibration_strand_accuracy_template aligned_speed_bps_template
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_100_ch_1633_strand.fast5 9f61ad33-99ff-4a16-a6d4-ffdac9051ccd c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 1633 277.198 13.36125 10689 False 0.0 10689
13.36125 10689 5335 5.377 -0.0004 filtered_out -1.0 -1.0 0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_100_ch_1920_strand.fast5 da47a154-c429-4c96-b2d8-19bdcd0b3220 c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 1920 224.002 4.0 3200 False 0.0 3200 4.0
3200 1901 4.552 -0.0004 filtered_out -1.0 -1.0 0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_100_ch_1812_strand.fast5 fb097417-af70-4767-afcd-8684d3f9f189 c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 1812 258.6105 2.873 2298 True 0.0 2298
2.873 2298 1263 10.319 -0.0008 filtered_out -1.0 -1.0 0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_101_ch_1812_strand.fast5 05159a17-71b2-4ac0-9b3e-ed6e77991365 c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 1812 261.4835 12.12925 9703 True 0.0
9703 12.12925 9703 5449 10.162 -0.0002 filtered_out -1.0 -1.0 0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_101_ch_849_strand.fast5 2e88b4b3-a7d6-4079-9101-14ecc6fcdbeb c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 849 201.0045 11.01525 8763 False 0.06125
8763 10.954 8763 4155 4.892 -0.0002 filtered_out -1.0 -1.0 0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_100_ch_320_strand.fast5 4ac31aa1-b514-4518-a954-1cc03a21cf8f c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 320 176.57475 11.22775 8982 False 0.0
8982 11.22775 8982 3439 4.826 -0.0002 * -1.0 -1.0 0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_101_ch_1953_strand.fast5 6cf7ec2e-1016-466b-a795-d2eafa1b73d1 c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 1953 258.81025 33.09075 26015 True 0.57125
26015 32.5195 26015 14344 9.642 -0.0002 filtered_out -1.0 -1.0 0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_102_ch_1706_strand.fast5 3d27b70c-b6f7-4697-a7f3-930d6930f47f c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 1706 201.25725 14.317 11286 True 0.20875 11286
14.10825 11286 6211 10.133 -0.0002 filtered_out -1.0 -1.0 0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_102_ch_1507_strand.fast5 4c882967-a824-4a56-bc42-efa0e67c7d4c c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe 1507 269.412 5.02175 3961 True 0.0695 3961 4.95225
3961 2104 9.876 -0.0004 filtered_out -1.0 -1.0 0.0
Hm, that looks fine to me.
If you run
nanopolish index -d /path/fast5 -s mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/1/sequencing_summary.txt reads.porechopped.fastq
do you get the same error?
Yes, looks like it:
Could not find filename column in the header of filename read_id run_id channel start_time duration num_events passes_filtering template_start num_events_template template_duration num_called_template sequence_length_templat
e mean_qscore_template strand_score_template calibration_strand_genome_template calibration_strand_identity_template calibration_strand_accuracy_template aligned_speed_bps_template
Although I'm curious about this issue, no stress, as I was running the slow version without -s
or -f
while we worked here, and that has finished. Almost suspiciously quickly, seeing how I have like 100+GB of fast5 files (about 2-3 hours, I had read one can expect closer to 8hrs? My computer is fine, but not like a beast or anything).
The last lines of the index output to the console were:
[readdb] indexing /mnt/f/Ian/Genome/nanopore_lmu/fast5/20180517_1534_Brachmann_CFL_Pippin/fast5/98
[readdb] indexing /mnt/f/Ian/Genome/nanopore_lmu/fast5/20180517_1534_Brachmann_CFL_Pippin/fast5/99
[readdb] num reads: 862981, num reads with path to fast5: 862870
Are the "missing" reads without a path to fast5 strike you as odd?
Thanks for all the quick responses!
Are you sure you ran with -s
instead of -f
there? Could you send me that summary file so I can try it locally?
The runtime of index
depends a lot on your filesystem, if the number of files it indexed is similar to the number of reads in your dataset, you're probably fine.
Jared
My mistake, mixed things up while copy-pasting, I did run -f again. Now with -s this seems to be working, currently cranking out those [readdb] indexing ...
lines.
(Off-topic question, I can open a new issue if you like) When running makerange.py with parallel:
python "/mnt/c/Users/ia643055/nanopolish/scripts/nanopolish_makerange.py" "/mnt/f/igwill/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 8 \
"/mnt/c/Users/ia643055/nanopolish/nanopolish" variants --consensus -o polished.{1}.vcf -w {1} -r "/mnt/f/igwill/Genome/nanopore_lmu/Brachmann_CFL.porechop.fastq" -b bignano.sorted.bam -g "/home/igwill/hybrid/nanopore_lmu/big/big.contigs.fasta" -t 4 --min-candidate-frequency 0.1
I receive an error related to parallel: mkdir nanopolish.results/1/tig00000002:0-50200: Invalid argument at /usr/local/bin/parallel line 9833.
The directory nanopolish.results/1 exists, but is empty.
I have Nanopolish 0.10.2. GNU parallel 20180922 (should be the most recent). Python 3.6.1.
Thank you
Running makerange without parallel seems OK, getting:
tig00000002:0-50200
tig00000002:50000-73160
tig00000003:0-50200
...many tigs in between...
tig00000743:3500000-3550200
tig00000743:3550000-3561158
tig00007633:0-18431
tig00007634:0-12940
tig00007635:0-34668
tig00007636:0-24852
tig00007637:0-4219
tig00007638:0-4231
tig00007639:0-5885
tig00007640:0-5883
tig00007641:0-7940
tig00007642:0-7940
tig00007643:0-13698
tig00007644:0-13654
Ok, it isn't a problem with the sequencing summary itself then. It must not like the .fofn file. Index will be slow because you only gave one index file in that test.
I'm not sure why parallel is giving that error. Try this test:
python "/mnt/c/Users/ia643055/nanopolish/scripts/nanopolish_makerange.py" "/mnt/f/igwill/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 8 echo "test" {1}
Hi,
running python "/mnt/c/Users/ia643055/nanopolish/scripts/nanopolish_makerange.py" "/mnt/f/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 8 echo "test" {1}
gives the same error
mkdir nanopolish.results/1/tig00000002:0-50200: Invalid argument at /usr/local/bin/parallel line 9833.
I also tried submitting to our computing cluster, which is running GNU parallel 20141022 and a slightly different error: mkdir nanopolish.results: File exists at /usr/bin/parallel line 4990
.
Hm, in that case it appears to be a problem with parallel
, not nanopolish
. It is not able to make the directory for the log files it generates. Maybe there is a permissions error somewhere?
Since it might have been a permission thing, I slapped a sudo
in front of the test command you gave me and at least the error has changed.
Now I get:
File "/mnt/c/Users/ia643055/nanopolish/scripts/nanopolish_makerange.py", line 5, in <module>
from Bio.SeqIO.FastaIO import SimpleFastaParser
ImportError: No module named Bio.SeqIO.FastaIO
However, if I try:
python
import Bio.SeqIO.FastaIO
as a way to test if I have that module, I don't get any error. So I would think the module is there. Does it look like a problem with my biopython setup to you?
nanopolish_makerange.py
was working for you before (https://github.com/jts/nanopolish/issues/490#issuecomment-431680781) - did something about your environment change?
Nothing changed, it's all a little bit baffling. To double check I just ran:
The ...echo "test" {1}
command you gave, and got the same error. I do see that this command successfully made new directories, /nanopolish.results/1/
which is empty.
The sudo ...
version of that command, where I still get that Bio.SeqIO.Fasta error I just reported, and no new directory is made.
nanopolish_makerange.py
only, which seems to work fine, printing many lines with tig*.
sudo nanopolish_makerange.py
only, which gets me the Bio.SeqIO.Fasta error
If you think this still boils down to a permissions problem, I will contact our department IT folks, since this is a work computer. Thanks for sticking with this.
Although maybe slow, is there a way to run this without parallel? Even it takes me x8 times longer presumably, at least I'll get my results.
Quick update, I seem to have it running on the cluster now. The error I got before (mkdir nanopolish.results: File exists at /usr/bin/parallel line 4990) has been cleared up. I was searching for a directory already named "nanopolish.results", but somehow I just had a file named that with all the tigs in it, which I only now caught. I moved out of that directory and things seem to be working.
Ah, glad to hear it!
Well, a nanpolished_genome.fa was produced and is full of bases. However, the head of my slurm.out seems to be reporting an error and is oddly big for an *.out (630MB):
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 140684329903872:
#000: H5F.c line 604 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#001: H5Fint.c line 990 in H5F_open(): unable to open file: time = Mon Oct 22 17:50:30 2018
, name = '/mnt/f/Ian/Genome/nanopore_lmu/fast5/20180517_1534_Brachmann_CFL_Pippin/fast5/170/PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_11408_ch_1818_strand.fast5', tent_flags = 0
major: File accessibilty
minor: Unable to open file
#002: H5FD.c line 992 in H5FD_open(): open failed
major: Virtual File Layer
I neglected to consider my 1TB of fast5s are stored on my machine and were index locally. Which I think is what this error is telling me. I can't upload them to my little cluster niche. Which then leaves me still sorting out how to get running on my computer.
I tried uninstalling & installing the exact version of parallel that the cluster had that worked, 20141022, but got fundamentally the same error. Only the referenced line has changed. mkdir nanopolish.results/1/tig00000002:0-50200: Invalid argument at /usr/local/bin/parallel line 4990. A fresh install of everything on a different machine, using a slightly older apt-get install version of parallel leads to the same thing (but on line number 3757 or so).
Any thoughts? There must be something off about the environments of the desktops I have access to?
Is there a nanopolish.results
file or directory? If so try deleting it before running the command.
That was the holdup on the cluster, but I have deleted any nanopolish.results files and dirs when trying locally. I even just tried moving to a new directory to run the command from, so there should be no chance of lingering nanopolish.results files. No dice, same error along with the creation of dir /nanopolish.results/1.
Is it /nanopolish.results/1
or nanopolish.results/1
?
nanopolish.results/1
made in the whichever directory I run the command from.
I tried the makerange.py with echo "test" {1}
and had no problems with my personal laptop. I think you must be right that some weird permission issue on the work machines prevent the making of the tig/ directories, but not the initial nanopolish.results/1 creation.
example result: test tig00000743:2850000-2900200
and I get nanopolish.results/1/various tig dirs* created.
I will try the full analysis once I can access my data files again. Will let you know how that goes, but am hopeful, so no need to waste your brain power on this issue for now.
It appears to be running on my laptop now, painfully slow, but chugging along. I'll need to get on a better rig to do this in a reasonable time. But the issue with using my lab computers probably is something that needs hashed out with our IT department, I guess.
I am getting some warnings with the run however:
igwill@LAPTOP-D6T3ATUK:/mnt/f/Ian/Genome/nanopore_lmu/nanopolished/laptoprun$ python "/mnt/f/Ian/Genome/nanopore_lmu/nanopolished/nanopolish_makerange.py" "/mnt/f/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 2 "/mnt/c/Users/igw87/nanopolish/nanopolish" variants --consensus -o polished.{1}.vcf -w {1} -r "/mnt/f/Ian/Genome/nanopore_lmu/Brachmann_CFL.porechop.fastq" -b "/mnt/f/Ian/Genome/nanopore_lmu/nanopolished/bignano.sorted.bam" -g "/mnt/f/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" -t 4 --min-candidate-frequency 0.1
Warning: The index file is older than the data file: /mnt/f/Ian/Genome/nanopore_lmu/nanopolished/bignano.sorted.bam.bai
Number of variants in span (14) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (17) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (14) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (15) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (10) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Warning: 72 variants in span, region not called [68606 68823]
Number of variants in span (14) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (34) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (27) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (19) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (13) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (18) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Warning: 110 variants in span, region not called [69250 69472]
Number of variants in span (18) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (21) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (32) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (39) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (13) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (36) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (11) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (15) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (60) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (73) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (32) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (21) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (24) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Warning: 118 variants in span, region not called [70240 70454]
Number of variants in span (20) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (15) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (12) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
[post-run summary] total reads: 30, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 1, bad fast5: 0
(I have parallel set with -P 2 rather than 8, since my little 2-core computer is already maxed out here, having produced 3 tig directories and .vcfs after a couple hours)
The first warning is about the bam index, not the nanopolish index. The .bai file is older than the .bam file, which probably means you need to re-run samtools index
.
Those nanopolish warnings are probably because you have very few reads (only 30 per segment?). Nanopolish needs rather high coverage (>50x) to make sense of the genome.
OK, rerunning samtools is easy enough.
About coverage, Canu tells me I had an estimated 180x of raw data, and a final unitig coverage of 41X. The next tigs were processed by nanopolish and gave many Number of variants in span...
notices, but Warning: ... region not called
problems are fairly rare. (e.g. [post-run summary] total reads: 101, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 8, bad fast5: 0
, or, [post-run summary] total reads: 808, unparseable: 0, qc fail: 1, could not calibrate: 0, no alignment: 23, bad fast5: 0
Possibly that first one with the warnings was just a bum low coverage region?
Also, trying everything on a new machine (a Mac), got me an error that variants had too many arguments. Seemed to repeat for every tig:
variants: too many arguments
Usage: nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa
Find SNPs using a signal-level HMM
-v, --verbose display verbose output
--version display version
--help display this help and exit
--snps only call SNPs
--consensus run in consensus calling mode
--fix-homopolymers run the experimental homopolymer caller
--faster minimize compute time while slightly reducing consensus accuracy
-w, --window=STR find variants in window STR (format: <chromsome_name>:<start>-<end>)
-r, --reads=FILE the ONT reads are in fasta FILE
-b, --bam=FILE the reads aligned to the reference genome are in bam FILE
-e, --event-bam=FILE the events aligned to the reference genome are in bam FILE
-g, --genome=FILE the reference genome is in FILE
-p, --ploidy=NUM the ploidy level of the sequenced genome
-q --methylation-aware=STR turn on methylation aware polishing and test motifs given in STR (example: -q dcm,dam)
--genotype=FILE call genotypes for the variants in the vcf FILE
-o, --outfile=FILE write result to FILE [default: stdout]
-t, --threads=NUM use NUM threads (default: 1)
-m, --min-candidate-frequency=F extract candidate variants from the aligned reads when the variant frequency is at least F (default 0.2)
-d, --min-candidate-depth=D extract candidate variants from the aligned reads when the depth is at least D (default: 20)
-x, --max-haplotypes=N consider at most N haplotype combinations (default: 1000)
--min-flanking-sequence=N distance from alignment end to calculate variants (default: 30)
--max-rounds=N perform N rounds of consensus sequence improvement (default: 50)
-c, --candidates=VCF read variant candidates from VCF, rather than discovering them from aligned reads
-a, --alternative-basecalls-bam=FILE if an alternative basecaller was used that does not output event annotations
then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam.
--calculate-all-support when making a call, also calculate the support of the 3 other possible bases
--models-fofn=FILE read alternative k-mer models from FILE
Report bugs to https://github.com/jts/nanopolish/issues
In response to this command:
python "/Users/ad110232/ian/nanopolish/scripts/nanopolish_makerange.py" "/Volumes/Seagate Expansion Drive/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 4 \
"/Users/ad110232/ian/nanopolish/nanopolish" variants --consensus -o polished.{1}.vcf -w {1} -r "/Volumes/Seagate Expansion Drive/Ian/Genome/nanopore_lmu/Brachmann_CFL.porechop.fastq" -b "/Volumes/Seagate Expansion Drive/Ian/Genome/nanopore_lmu/nanopolished/bignano.sorted.bam" -g "/Volumes/Seagate Expansion Drive/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" -t 4 --min-candidate-frequency 0.1
Sounded like a typo error, but I can't find the issue. I copy-pasted what was working before and only changed the paths.
It might not like the spaces in the paths
Ah could be. I am just biting the bullet and waiting while transferring all of my data to a computer that I confirmed the commands work on. Hopefully that'll be the end of it. Thanks!
@igwill Just wondering if you resolved the issue of "variants: too many arguments" I am getting the same error when running nanopolish_makerange.py in parallel...
python "usr/bin/nanopolish/scripts/nanopolish_makerange.py" My.contigs.fasta | parallel --results nanopolish.results -P 8 nanopolish variants --consensus -o polished.{1}.vcf -w {1} -r ../all_run2PCNoMid.fa -b all_run2PCNoMid.sorted.bam -g My.contigs.fasta -t 4 --min-candidate-frequency 0.1
@jts Running the makerange.py with echo "test" {1}
outputs:
test tig00000004:0-50200 test tig00000004:50000-100200 test tig00000004:100000-150200 test tig00000004:150000-200200 test tig00000004:200000-250200 test tig00000004:250000-300200 test tig00000004:300000-350200 test tig00000004:350000-373468
I tried running it without the -o argument and it returns empty .vcf files...
would appreciate any advice.... thanks in advance! Kathy
HI Kathy,
What nanopolish version are you using? Are you able to successfully complete the tutorial? https://nanopolish.readthedocs.io/en/latest/quickstart_consensus.html
Jared
Hi Jared, Thanks for rapid reply. Trying both now. Will update you on progress/results! Thanks again Kathy
Hi Kathy, I did not resolve that issue, I simply ran away from my problems and am running on a different machine now.
If -f still dosnt work why did you close this? I cant use the software becuase I have 2 sets of data from a run that got interupted mid way.....
I closed this because the comments had diverged from the original issue. If you have a problem please open a new issue with details of what happened and data I can use to reproduce it.
On Jan 9, 2019, at 9:54 PM, chilltrout notifications@github.com wrote:
If -f still dosnt work why did you close this? I cant use the software becuase I have 2 sets of data from a run that got interupted mid way.....
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.
Hi,
I'm trying to use
-f
to pass in a .txt file containing the path for each of many sequencing_summary.txt files duringnanopolish -index
. I keep getting the error "Could not find filename column in the header of filename" (or whatever the first line of my file is). I can get other programs (e.g. R) to understand my first line as header, but not here. Should I be carefully formatting this file in some way? First few lines:Additionally, I want to make sure I am indexing things correctly. I have:
/path/fast5/
containing 216 folders (numbered 0 : 215) each with many .fast5/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/
containing 215 folders (numbered 1 : 215), each containing a sequencing_summary.txt and /workspace/0 with .fast5sI am using something strucured like:
nanopolish index -d /path/fast5 -f all_summaries.txt
(pointed at those .txts in /path/albacore...)reads.porechopped.fastq
Thank you!