jts / nanopolish

Signal-level algorithms for MinION data
MIT License
550 stars 160 forks source link

index with -f for multiple summary files #490

Closed igwill closed 5 years ago

igwill commented 5 years ago

Hi,

I'm trying to use -f to pass in a .txt file containing the path for each of many sequencing_summary.txt files during nanopolish -index. I keep getting the error "Could not find filename column in the header of filename" (or whatever the first line of my file is). I can get other programs (e.g. R) to understand my first line as header, but not here. Should I be carefully formatting this file in some way? First few lines:

filename
mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/1/sequencing_summary.txt
mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/2/sequencing_summary.txt
mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/3/sequencing_summary.txt

Additionally, I want to make sure I am indexing things correctly. I have: /path/fast5/ containing 216 folders (numbered 0 : 215) each with many .fast5 /path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/ containing 215 folders (numbered 1 : 215), each containing a sequencing_summary.txt and /workspace/0 with .fast5s

I am using something strucured like: nanopolish index -d /path/fast5 -f all_summaries.txt (pointed at those .txts in /path/albacore...) reads.porechopped.fastq

Thank you!

jts commented 5 years ago

The -f file should not contain a header, I think if you remove the first line it should work.

jared

igwill commented 5 years ago

Ah, sorry should have been more clear, if I remove "filename" the error reports the first line of the file. e.g.: "Could not find filename column in the header of mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/1/sequencing_summary.txt"

jts commented 5 years ago

What does head mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/1/sequencing_summary.txt look like?

igwill commented 5 years ago
filename        read_id run_id  channel start_time      duration        num_events      passes_filtering        template_start  num_events_template     template_duration       num_called_template     sequence_length_template        mean_qscore_template    strand_score_te
mplate  calibration_strand_genome_template      calibration_strand_identity_template    calibration_strand_accuracy_template    aligned_speed_bps_template
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_100_ch_1633_strand.fast5   9f61ad33-99ff-4a16-a6d4-ffdac9051ccd    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        1633    277.198 13.36125        10689   False   0.0     10689
13.36125        10689   5335    5.377   -0.0004 filtered_out    -1.0    -1.0    0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_100_ch_1920_strand.fast5   da47a154-c429-4c96-b2d8-19bdcd0b3220    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        1920    224.002 4.0     3200    False   0.0     3200    4.0
3200    1901    4.552   -0.0004 filtered_out    -1.0    -1.0    0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_100_ch_1812_strand.fast5   fb097417-af70-4767-afcd-8684d3f9f189    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        1812    258.6105        2.873   2298    True    0.0     2298
2.873   2298    1263    10.319  -0.0008 filtered_out    -1.0    -1.0    0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_101_ch_1812_strand.fast5   05159a17-71b2-4ac0-9b3e-ed6e77991365    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        1812    261.4835        12.12925        9703    True    0.0
9703    12.12925        9703    5449    10.162  -0.0002 filtered_out    -1.0    -1.0    0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_101_ch_849_strand.fast5    2e88b4b3-a7d6-4079-9101-14ecc6fcdbeb    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        849     201.0045        11.01525        8763    False   0.06125
8763    10.954  8763    4155    4.892   -0.0002 filtered_out    -1.0    -1.0    0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_100_ch_320_strand.fast5    4ac31aa1-b514-4518-a954-1cc03a21cf8f    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        320     176.57475       11.22775        8982    False   0.0
8982    11.22775        8982    3439    4.826   -0.0002 *       -1.0    -1.0    0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_101_ch_1953_strand.fast5   6cf7ec2e-1016-466b-a795-d2eafa1b73d1    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        1953    258.81025       33.09075        26015   True    0.57125
26015   32.5195 26015   14344   9.642   -0.0002 filtered_out    -1.0    -1.0    0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_102_ch_1706_strand.fast5   3d27b70c-b6f7-4697-a7f3-930d6930f47f    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        1706    201.25725       14.317  11286   True    0.20875 11286
14.10825        11286   6211    10.133  -0.0002 filtered_out    -1.0    -1.0    0.0
PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_102_ch_1507_strand.fast5   4c882967-a824-4a56-bc42-efa0e67c7d4c    c0f03a85ac9a5b4dfbdfa46510ea47d1d67baabe        1507    269.412 5.02175 3961    True    0.0695  3961    4.95225
3961    2104    9.876   -0.0004 filtered_out    -1.0    -1.0    0.0
jts commented 5 years ago

Hm, that looks fine to me.

If you run

nanopolish index -d /path/fast5 -s mnt/f/path/albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir/1/sequencing_summary.txt reads.porechopped.fastq

do you get the same error?

igwill commented 5 years ago

Yes, looks like it:

Could not find filename column in the header of filename        read_id run_id  channel start_time      duration        num_events      passes_filtering        template_start  num_events_template     template_duration       num_called_template     sequence_length_templat
e       mean_qscore_template    strand_score_template   calibration_strand_genome_template      calibration_strand_identity_template    calibration_strand_accuracy_template    aligned_speed_bps_template

Although I'm curious about this issue, no stress, as I was running the slow version without -s or -f while we worked here, and that has finished. Almost suspiciously quickly, seeing how I have like 100+GB of fast5 files (about 2-3 hours, I had read one can expect closer to 8hrs? My computer is fine, but not like a beast or anything). The last lines of the index output to the console were:

[readdb] indexing /mnt/f/Ian/Genome/nanopore_lmu/fast5/20180517_1534_Brachmann_CFL_Pippin/fast5/98
[readdb] indexing /mnt/f/Ian/Genome/nanopore_lmu/fast5/20180517_1534_Brachmann_CFL_Pippin/fast5/99
[readdb] num reads: 862981, num reads with path to fast5: 862870

Are the "missing" reads without a path to fast5 strike you as odd?

Thanks for all the quick responses!

jts commented 5 years ago

Are you sure you ran with -s instead of -f there? Could you send me that summary file so I can try it locally?

The runtime of index depends a lot on your filesystem, if the number of files it indexed is similar to the number of reads in your dataset, you're probably fine.

Jared

igwill commented 5 years ago

My mistake, mixed things up while copy-pasting, I did run -f again. Now with -s this seems to be working, currently cranking out those [readdb] indexing ... lines.

(Off-topic question, I can open a new issue if you like) When running makerange.py with parallel:

python "/mnt/c/Users/ia643055/nanopolish/scripts/nanopolish_makerange.py" "/mnt/f/igwill/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 8 \
    "/mnt/c/Users/ia643055/nanopolish/nanopolish" variants --consensus -o polished.{1}.vcf -w {1} -r "/mnt/f/igwill/Genome/nanopore_lmu/Brachmann_CFL.porechop.fastq" -b bignano.sorted.bam -g "/home/igwill/hybrid/nanopore_lmu/big/big.contigs.fasta" -t 4 --min-candidate-frequency 0.1

I receive an error related to parallel: mkdir nanopolish.results/1/tig00000002:0-50200: Invalid argument at /usr/local/bin/parallel line 9833. The directory nanopolish.results/1 exists, but is empty.

I have Nanopolish 0.10.2. GNU parallel 20180922 (should be the most recent). Python 3.6.1.

Thank you

igwill commented 5 years ago

Running makerange without parallel seems OK, getting:

tig00000002:0-50200
tig00000002:50000-73160
tig00000003:0-50200

...many tigs in between...

tig00000743:3500000-3550200
tig00000743:3550000-3561158
tig00007633:0-18431
tig00007634:0-12940
tig00007635:0-34668
tig00007636:0-24852
tig00007637:0-4219
tig00007638:0-4231
tig00007639:0-5885
tig00007640:0-5883
tig00007641:0-7940
tig00007642:0-7940
tig00007643:0-13698
tig00007644:0-13654
jts commented 5 years ago

Ok, it isn't a problem with the sequencing summary itself then. It must not like the .fofn file. Index will be slow because you only gave one index file in that test.

I'm not sure why parallel is giving that error. Try this test:

python "/mnt/c/Users/ia643055/nanopolish/scripts/nanopolish_makerange.py" "/mnt/f/igwill/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 8 echo "test" {1}
igwill commented 5 years ago

Hi, running python "/mnt/c/Users/ia643055/nanopolish/scripts/nanopolish_makerange.py" "/mnt/f/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 8 echo "test" {1} gives the same error mkdir nanopolish.results/1/tig00000002:0-50200: Invalid argument at /usr/local/bin/parallel line 9833.

I also tried submitting to our computing cluster, which is running GNU parallel 20141022 and a slightly different error: mkdir nanopolish.results: File exists at /usr/bin/parallel line 4990.

jts commented 5 years ago

Hm, in that case it appears to be a problem with parallel, not nanopolish. It is not able to make the directory for the log files it generates. Maybe there is a permissions error somewhere?

igwill commented 5 years ago

Since it might have been a permission thing, I slapped a sudoin front of the test command you gave me and at least the error has changed. Now I get:

File "/mnt/c/Users/ia643055/nanopolish/scripts/nanopolish_makerange.py", line 5, in <module>
    from Bio.SeqIO.FastaIO import SimpleFastaParser
ImportError: No module named Bio.SeqIO.FastaIO

However, if I try:

python
import Bio.SeqIO.FastaIO

as a way to test if I have that module, I don't get any error. So I would think the module is there. Does it look like a problem with my biopython setup to you?

jts commented 5 years ago

nanopolish_makerange.py was working for you before (https://github.com/jts/nanopolish/issues/490#issuecomment-431680781) - did something about your environment change?

igwill commented 5 years ago

Nothing changed, it's all a little bit baffling. To double check I just ran:

If you think this still boils down to a permissions problem, I will contact our department IT folks, since this is a work computer. Thanks for sticking with this.

igwill commented 5 years ago

Although maybe slow, is there a way to run this without parallel? Even it takes me x8 times longer presumably, at least I'll get my results.

igwill commented 5 years ago

Quick update, I seem to have it running on the cluster now. The error I got before (mkdir nanopolish.results: File exists at /usr/bin/parallel line 4990) has been cleared up. I was searching for a directory already named "nanopolish.results", but somehow I just had a file named that with all the tigs in it, which I only now caught. I moved out of that directory and things seem to be working.

jts commented 5 years ago

Ah, glad to hear it!

igwill commented 5 years ago

Well, a nanpolished_genome.fa was produced and is full of bases. However, the head of my slurm.out seems to be reporting an error and is oddly big for an *.out (630MB):

HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 140684329903872:
  #000: H5F.c line 604 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 990 in H5F_open(): unable to open file: time = Mon Oct 22 17:50:30 2018
, name = '/mnt/f/Ian/Genome/nanopore_lmu/fast5/20180517_1534_Brachmann_CFL_Pippin/fast5/170/PCA0045_20180517_0004A30B0020430F_PH_p_103_00_sequencing_run_Brachmann_CFL_Pippin_21764_read_11408_ch_1818_strand.fast5', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 992 in H5FD_open(): open failed
    major: Virtual File Layer

I neglected to consider my 1TB of fast5s are stored on my machine and were index locally. Which I think is what this error is telling me. I can't upload them to my little cluster niche. Which then leaves me still sorting out how to get running on my computer.

I tried uninstalling & installing the exact version of parallel that the cluster had that worked, 20141022, but got fundamentally the same error. Only the referenced line has changed. mkdir nanopolish.results/1/tig00000002:0-50200: Invalid argument at /usr/local/bin/parallel line 4990. A fresh install of everything on a different machine, using a slightly older apt-get install version of parallel leads to the same thing (but on line number 3757 or so).

Any thoughts? There must be something off about the environments of the desktops I have access to?

jts commented 5 years ago

Is there a nanopolish.results file or directory? If so try deleting it before running the command.

igwill commented 5 years ago

That was the holdup on the cluster, but I have deleted any nanopolish.results files and dirs when trying locally. I even just tried moving to a new directory to run the command from, so there should be no chance of lingering nanopolish.results files. No dice, same error along with the creation of dir /nanopolish.results/1.

jts commented 5 years ago

Is it /nanopolish.results/1 or nanopolish.results/1?

igwill commented 5 years ago

nanopolish.results/1 made in the whichever directory I run the command from.

igwill commented 5 years ago

I tried the makerange.py with echo "test" {1} and had no problems with my personal laptop. I think you must be right that some weird permission issue on the work machines prevent the making of the tig/ directories, but not the initial nanopolish.results/1 creation. example result: test tig00000743:2850000-2900200 and I get nanopolish.results/1/various tig dirs* created.

I will try the full analysis once I can access my data files again. Will let you know how that goes, but am hopeful, so no need to waste your brain power on this issue for now.

igwill commented 5 years ago

It appears to be running on my laptop now, painfully slow, but chugging along. I'll need to get on a better rig to do this in a reasonable time. But the issue with using my lab computers probably is something that needs hashed out with our IT department, I guess.

I am getting some warnings with the run however:

igwill@LAPTOP-D6T3ATUK:/mnt/f/Ian/Genome/nanopore_lmu/nanopolished/laptoprun$ python "/mnt/f/Ian/Genome/nanopore_lmu/nanopolished/nanopolish_makerange.py" "/mnt/f/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 2 "/mnt/c/Users/igw87/nanopolish/nanopolish" variants --consensus -o polished.{1}.vcf -w {1} -r "/mnt/f/Ian/Genome/nanopore_lmu/Brachmann_CFL.porechop.fastq" -b "/mnt/f/Ian/Genome/nanopore_lmu/nanopolished/bignano.sorted.bam" -g "/mnt/f/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" -t 4 --min-candidate-frequency 0.1
Warning: The index file is older than the data file: /mnt/f/Ian/Genome/nanopore_lmu/nanopolished/bignano.sorted.bam.bai
Number of variants in span (14) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (17) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (14) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (15) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (10) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Warning: 72 variants in span, region not called [68606 68823]
Number of variants in span (14) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (34) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (27) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (19) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (13) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (18) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Warning: 110 variants in span, region not called [69250 69472]
Number of variants in span (18) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (21) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (32) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (39) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (13) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (36) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (11) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (15) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (60) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (73) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (32) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (21) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (24) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Warning: 118 variants in span, region not called [70240 70454]
Number of variants in span (20) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (15) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
Number of variants in span (12) would exceed max-haplotypes. Variants may be missed. Consider running with a higher value of max-haplotypes!
[post-run summary] total reads: 30, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 1, bad fast5: 0

(I have parallel set with -P 2 rather than 8, since my little 2-core computer is already maxed out here, having produced 3 tig directories and .vcfs after a couple hours)

jts commented 5 years ago

The first warning is about the bam index, not the nanopolish index. The .bai file is older than the .bam file, which probably means you need to re-run samtools index.

Those nanopolish warnings are probably because you have very few reads (only 30 per segment?). Nanopolish needs rather high coverage (>50x) to make sense of the genome.

igwill commented 5 years ago

OK, rerunning samtools is easy enough.

About coverage, Canu tells me I had an estimated 180x of raw data, and a final unitig coverage of 41X. The next tigs were processed by nanopolish and gave many Number of variants in span... notices, but Warning: ... region not called problems are fairly rare. (e.g. [post-run summary] total reads: 101, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 8, bad fast5: 0, or, [post-run summary] total reads: 808, unparseable: 0, qc fail: 1, could not calibrate: 0, no alignment: 23, bad fast5: 0 Possibly that first one with the warnings was just a bum low coverage region?

igwill commented 5 years ago

Also, trying everything on a new machine (a Mac), got me an error that variants had too many arguments. Seemed to repeat for every tig:

variants: too many arguments

Usage: nanopolish variants [OPTIONS] --reads reads.fa --bam alignments.bam --genome genome.fa
Find SNPs using a signal-level HMM

  -v, --verbose                        display verbose output
      --version                        display version
      --help                           display this help and exit
      --snps                           only call SNPs
      --consensus                      run in consensus calling mode
      --fix-homopolymers               run the experimental homopolymer caller
      --faster                         minimize compute time while slightly reducing consensus accuracy
  -w, --window=STR                     find variants in window STR (format: <chromsome_name>:<start>-<end>)
  -r, --reads=FILE                     the ONT reads are in fasta FILE
  -b, --bam=FILE                       the reads aligned to the reference genome are in bam FILE
  -e, --event-bam=FILE                 the events aligned to the reference genome are in bam FILE
  -g, --genome=FILE                    the reference genome is in FILE
  -p, --ploidy=NUM                     the ploidy level of the sequenced genome
  -q  --methylation-aware=STR          turn on methylation aware polishing and test motifs given in STR (example: -q dcm,dam)
      --genotype=FILE                  call genotypes for the variants in the vcf FILE
  -o, --outfile=FILE                   write result to FILE [default: stdout]
  -t, --threads=NUM                    use NUM threads (default: 1)
  -m, --min-candidate-frequency=F      extract candidate variants from the aligned reads when the variant frequency is at least F (default 0.2)
  -d, --min-candidate-depth=D          extract candidate variants from the aligned reads when the depth is at least D (default: 20)
  -x, --max-haplotypes=N               consider at most N haplotype combinations (default: 1000)
      --min-flanking-sequence=N        distance from alignment end to calculate variants (default: 30)
      --max-rounds=N                   perform N rounds of consensus sequence improvement (default: 50)
  -c, --candidates=VCF                 read variant candidates from VCF, rather than discovering them from aligned reads
  -a, --alternative-basecalls-bam=FILE if an alternative basecaller was used that does not output event annotations
                                       then use basecalled sequences from FILE. The signal-level events will still be taken from the -b bam.
      --calculate-all-support          when making a call, also calculate the support of the 3 other possible bases
      --models-fofn=FILE               read alternative k-mer models from FILE

Report bugs to https://github.com/jts/nanopolish/issues

In response to this command:

python "/Users/ad110232/ian/nanopolish/scripts/nanopolish_makerange.py" "/Volumes/Seagate Expansion Drive/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" | parallel --results nanopolish.results -P 4 \
    "/Users/ad110232/ian/nanopolish/nanopolish" variants --consensus -o polished.{1}.vcf -w {1} -r "/Volumes/Seagate Expansion Drive/Ian/Genome/nanopore_lmu/Brachmann_CFL.porechop.fastq" -b "/Volumes/Seagate Expansion Drive/Ian/Genome/nanopore_lmu/nanopolished/bignano.sorted.bam" -g "/Volumes/Seagate Expansion Drive/Ian/Genome/nanopore_lmu/big/big.contigs.fasta" -t 4 --min-candidate-frequency 0.1

Sounded like a typo error, but I can't find the issue. I copy-pasted what was working before and only changed the paths.

jts commented 5 years ago

It might not like the spaces in the paths

igwill commented 5 years ago

Ah could be. I am just biting the bullet and waiting while transferring all of my data to a computer that I confirmed the commands work on. Hopefully that'll be the end of it. Thanks!

likath commented 5 years ago

@igwill Just wondering if you resolved the issue of "variants: too many arguments" I am getting the same error when running nanopolish_makerange.py in parallel...

python "usr/bin/nanopolish/scripts/nanopolish_makerange.py" My.contigs.fasta | parallel --results nanopolish.results -P 8 nanopolish variants --consensus -o polished.{1}.vcf -w {1} -r ../all_run2PCNoMid.fa -b all_run2PCNoMid.sorted.bam -g My.contigs.fasta -t 4 --min-candidate-frequency 0.1

@jts Running the makerange.py with echo "test" {1} outputs: test tig00000004:0-50200 test tig00000004:50000-100200 test tig00000004:100000-150200 test tig00000004:150000-200200 test tig00000004:200000-250200 test tig00000004:250000-300200 test tig00000004:300000-350200 test tig00000004:350000-373468

I tried running it without the -o argument and it returns empty .vcf files...

would appreciate any advice.... thanks in advance! Kathy

jts commented 5 years ago

HI Kathy,

What nanopolish version are you using? Are you able to successfully complete the tutorial? https://nanopolish.readthedocs.io/en/latest/quickstart_consensus.html

Jared

likath commented 5 years ago

Hi Jared, Thanks for rapid reply. Trying both now. Will update you on progress/results! Thanks again Kathy

igwill commented 5 years ago

Hi Kathy, I did not resolve that issue, I simply ran away from my problems and am running on a different machine now.

chilltrout commented 5 years ago

If -f still dosnt work why did you close this? I cant use the software becuase I have 2 sets of data from a run that got interupted mid way.....

jts commented 5 years ago

I closed this because the comments had diverged from the original issue. If you have a problem please open a new issue with details of what happened and data I can use to reproduce it.

On Jan 9, 2019, at 9:54 PM, chilltrout notifications@github.com wrote:

If -f still dosnt work why did you close this? I cant use the software becuase I have 2 sets of data from a run that got interupted mid way.....

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.