tseemann / snippy

:scissors: :zap: Rapid haploid variant calling and core genome alignment
GNU General Public License v2.0
477 stars 115 forks source link

snippy-multi unreadable input files #373

Closed emrosenthal17 closed 4 years ago

emrosenthal17 commented 4 years ago

This is a follow up to issue #279.

https://github.com/tseemann/snippy/issues/279

Hello, I'm encountering this problem today and the above solutions are not working for me. I've tried several strategies of creating the tab file, including om9055's suggestion to make the csv file in MS Word and convert to tab using sed, as well as manually writing the input file using nano in terminal. I was thinking the issue could be with recognizing the full path, but working both on my local cpu and in a supercomputer, the path seems to be found properly but the file is still unreadable. I've also run the --check on snippy and see the dependencies are installed. Any further suggestions for this issue? Thanks!

running the command: $ snippy-multi input-test7.tab Reading: input-test7.tab ERROR: [BS0336] unreadable file '/storage/home/err5/scratch/0336_S1_L001_R1_001_paired.fastq'

tseemann commented 4 years ago

Can paste the output of these commands:

cat -vet input-test7.tab
od -a  input-test7.tab
emrosenthal17 commented 4 years ago

Hello, I apologize for the delay and thank you for looking into this! Yes, here is the output from the commands you listed:

BS0336^I/storage/home/err5/scratch/0336_S1_L001_R1_001_paired.fastq^I/storage/home/err5/scratch/0336_S1_L001_R2_001_paired.fastq$ BS0339^I/storage/home/err5/scratch/0339_S2_L001_R1_001_paired.fastq^I/storage/home/err5/scratch/0339_S2_L001_R2_001_paired.fastq^I$

0000000 B S 0 3 3 6 ht / s t o r a g e / 0000020 h o m e / e r r 5 / s c r a t c 0000040 h / 0 3 3 6 S 1 L 0 0 1 R 0000060 1 0 0 1 p a i r e d . f a s 0000100 t q ht / s t o r a g e / h o m e 0000120 / e r r 5 / s c r a t c h / 0 3 0000140 3 6 S 1 L 0 0 1 R 2 0 0 0000160 1 p a i r e d . f a s t q nl B 0000200 S 0 3 3 9 ht / s t o r a g e / h 0000220 o m e / e r r 5 / s c r a t c h 0000240 / 0 3 3 9 S 2 L 0 0 1 R 1 0000260 0 0 1 p a i r e d . f a s t 0000300 q ht / s t o r a g e / h o m e / 0000320 e r r 5 / s c r a t c h / 0 3 3 0000340 9 S 2 L 0 0 1 R 2 0 0 1 0000360 p a i r e d . f a s t q ht nl 0000377

emrosenthal17 commented 4 years ago

Ok! I have an update, and now it works!

Firstly, in my latest run I'm using differently trimmed reads and the extension happens to be .fq instead of .fastq; that change didn't affect snippy. Secondly, I got rid of the tab that was apparent above after running cat -vet input-test7.tab; this also didn't seem to affect snippy.

Here's what made a difference: My read files started with the sample name and number such as "0336_S1" but in our lab the sample names also have a collection ID as a prefix so they're "BS0336." To get the snippy-multi line to work, in the input file the sample ID had to be consistent between the first column and the the second and third column.

After renaming the read files to include the "BS," the input file now looks like this:

BS0336 /storage/home/err5/scratch/BS0336_S1_L001_R1_001_paired.fq /storage/home/err5/scratch/BS0336_S1_L001_R2_001_paired.fq BS0339 /storage/home/err5/scratch/BS0339_S2_L001_R1_001_paired.fq /storage/home/err5/scratch/BS0339_S2_L001_R2_001_paired.fq

Now the runme.sh turns out like this:

snippy --outdir 'BS0336' --R1 '/gpfs/scratch/err5/BS0336_S1_L001_R1_001_paired.fq' --R2 '/gpfs/scratch/err5/BS0336_S1_L001_R2_001_paired.fq' -ref CM002307_Xh.gb --cpus 4 snippy --outdir 'BS0339' --R1 '/gpfs/scratch/err5/BS0339_S2_L001_R1_001_paired.fq' --R2 '/gpfs/scratch/err5/BS0339_S2_L001_R2_001_paired.fq' -ref CM002307_Xh.gb --cpus 4

I was glad to find it was a simple fix! Thank you for the guidance Dr. Seemann.

ya-92 commented 4 years ago

hello tseemann , I am trying to run multi-snippy but it gives me the error: unable to read isolate table file.

i ran cat vet inopu.tab l.mono1 /home/y/ya97/Desktop/FASTA_FILES/L.MONO_SNPS/l.mono_fasta/l.mono1_04282020.FASTA
l.mono2 /home/y/ya97/Desktop/FASTA_FILES/L.MONO_SNPS/l.mono_fasta/l.mono2_04282020.FASTA
l.mono3 /home/y/ya97/Desktop/FASTA_FILES/L.MONO_SNPS/l.mono_fasta/l.mono3_04282020.FASTA
l.mono4 /home/y/ya97/Desktop/FASTA_FILES/L.MONO_SNPS/l.mono_fasta/l.mono4_04282020.FASTA
l.mono5 /home/y/ya97/Desktop/FASTA_FILES/L.MONO_SNPS/l.mono_fasta/l.mono504282020.FASTA
and also od -a input.tab
0000000 l . m o n o 1 ht ht / h o m e / y 0000020 / y a 9 7 / D e s k t o p / F A 0000040 S T A
F I L E S / L . M O N O 0000060 S N P S / l . m o n o f a s 0000100 t a / l . m o n o 1 0 4 2 8 2 0000120 0 2 0 . F A S T A ht ht sp nl l . m 0000140 o n o 2 ht ht / h o m e / y / y a 0000160 9 7 / D e s k t o p / F A S T A 0000200 F I L E S / L . M O N O S N 0000220 P S / l . m o n o f a s t a / 0000240 l . m o n o 2 0 4 2 8 2 0 2 0 0000260 . F A S T A ht ht nl l . m o n o 3 0000300 ht ht / h o m e / y / y a 9 7 / D 0000320 e s k t o p / F A S T A F I L 0000340 E S / L . M O N O S N P S / l 0000360 . m o n o f a s t a / l . m o 0000400 n o 3 0 4 2 8 2 0 2 0 . F A S 0000420 T A ht ht nl l . m o n o 4 ht ht / h 0000440 o m e / y / y a 9 7 / D e s k t 0000460 o p / F A S T A F I L E S / L 0000500 . M O N O S N P S / l . m o n 0000520 o f a s t a / l . m o n o 4 0000540 0 4 2 8 2 0 2 0 . F A S T A ht ht 0000560 nl l . m o n o 5 ht ht / h o m e / 0000600 y / y a 9 7 / D e s k t o p / F 0000620 A S T A F I L E S / L . M O N 0000640 O S N P S / l . m o n o f a 0000660 s t a / l . m o n o 5 _ 0 4 2 8 0000700 2 0 2 0 . F A S T A ht ht nl 0000715 i appreciate your help.

tseemann commented 4 years ago

@ya-92 youhave TWO tabs - only ONE is allowed ht = horizontal tab l . m o n o 1 ht ht / h o m e / y

chainsmiler commented 4 years ago

Hello tseemann, I also encountered the snippy-multi unreadable input files problem today. here are my running results:

(snippy) [ufo@n607s1 snippy]$ head input.tab|od -a 0000000 G C F 0 0 2 7 9 9 6 2 5 . 1 0000020 A S M 2 7 9 9 6 2 v 1 ht / h o m 0000040 e / u f o / G W A S / s n i p p 0000060 y / h e r e f a / G C F 0 0 2 0000100 7 9 9 6 2 5 . 1 A S M 2 7 9 9 0000120 6 2 v 1 g e n o m i c . f a cr 0000140 nl G C F 0 0 2 7 9 9 6 4 5 . 1 0000160 A S M 2 7 9 9 6 4 v 1 ht / h o 0000200 m e / u f o / G W A S / s n i p 0000220 p y / h e r e f a / G C F 0 0 0000240 2 7 9 9 6 4 5 . 1 A S M 2 7 9 0000260 9 6 4 v 1 g e n o m i c . f a 0000300 cr nl G C F 0 0 2 7 9 9 6 5 5 . 0000320 1 A S M 2 7 9 9 6 5 v 1 ht / h 0000340 o m e / u f o / G W A S / s n i 0000360 p p y / h e r e f a / G C F 0 0000400 0 2 7 9 9 6 5 5 . 1 A S M 2 7 0000420 9 9 6 5 v 1 g e n o m i c . f 0000440 a cr nl G C F 0 0 2 7 9 9 6 9 5 0000460 . 1 A S M 2 7 9 9 6 9 v 1 ht / 0000500 h o m e / u f o / G W A S / s n 0000520 i p p y / h e r e f a / G C F

(snippy) [ufo@n607s1 snippy]$ cat -vet input.tab GCF_002799625.1_ASM279962v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799625.1_ASM279962v1_genomic.fa^M$ GCF_002799645.1_ASM279964v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799645.1_ASM279964v1_genomic.fa^M$ GCF_002799655.1_ASM279965v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799655.1_ASM279965v1_genomic.fa^M$ GCF_002799695.1_ASM279969v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799695.1_ASM279969v1_genomic.fa^M$ GCF_002799715.1_ASM279971v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799715.1_ASM279971v1_genomic.fa^M$ GCF_002799725.1_ASM279972v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799725.1_ASM279972v1_genomic.fa^M$ GCF_002799765.1_ASM279976v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799765.1_ASM279976v1_genomic.fa^M$ GCF_002799775.1_ASM279977v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799775.1_ASM279977v1_genomic.fa^M$ GCF_002799795.1_ASM279979v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799795.1_ASM279979v1_genomic.fa^M$ GCF_002799845.1_ASM279984v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799845.1_ASM279984v1_genomic.fa^M$ GCF_002799855.1_ASM279985v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799855.1_ASM279985v1_genomic.fa^M$ GCF_002799875.1_ASM279987v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799875.1_ASM279987v1_genomic.fa^M$ GCF_002799905.1_ASM279990v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799905.1_ASM279990v1_genomic.fa^M$ GCF_002799945.1_ASM279994v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799945.1_ASM279994v1_genomic.fa^M$ GCF_002799965.1_ASM279996v1^I/home/ufo/GWAS/snippy/herefa/GCF_002799965.1_ASM279996v1_genomic.fa^M$

Thanks for your help

ASridhar94 commented 3 years ago

@chainsmiler looks like you have a carriage return in your input.tab at the end of each line before new line, I faced a similar issue and removing that using '''sed 's/\r$//' file.txt > out.txt''' helped me clean the input.

Hope this helps :)

florenmartino commented 8 months ago

HI there! I cannot make it work. Can someone helpme?

I have this issue:

[20:37:32] Treating reference as 'fasta' format. [20:37:32] Will use 16 CPU cores. [20:37:32] Using read file: /mnt/Netapp/2024/CNGB24-C010/ATB-Flor/trimmed/JH85-b_1.fastq.gz [20:37:32] Using read file: /mnt/Netapp/2024/CNGB24-C010/ATB-Flor/trimmed/JH85-b_2.fastq.gz [20:37:32] Output folder prueba already exists. Remove or use --force. This is snippy-core 4.6.0 Obtained from http://github.com/tseemann/snippy Enabling bundled tools for linux Found any2fasta - /usr/local/bin/any2fasta Found samtools - /usr/local/bin/samtools Found minimap2 - /home/inei/minimap2-2.17_x64-linux/minimap2 Found bedtools - /usr/local/bin/bedtools Found snp-sites - /usr/bin/snp-sites ERROR: Can't open --ref ATB_JH240Aba/ref.fa

After that I tried to fix tha .tab in case that is the problem. But the error was: 'RROR: [JH37-AB-b] unreadable file 'JH37-AB-b_2.fastq.gz

0000000 J H 3 7 - A B - b ht J H 3 7 - A 0000020 B - b 1 . f a s t q . g z ht J 0000040 H 3 7 - A B - b 2 . f a s t q 0000060 . g z cr nl J H 8 6 - A B - b ht J 0000100 H 8 6 - A B - b 1 . f a s t q 0000120 . g z ht J H 8 6 - A B - b 2 . 0000140 f a s t q . g z cr nl J H 9 0 - A 0000160 B - b ht J H 9 0 - A B - b 1 . 0000200 f a s t q . g z ht J H 9 0 - A B 0000220 - b 2 . f a s t q . g z cr nl A 0000240 T B J H 2 4 0 A b a ht A T B 0000260 J H 2 4 0 A b a 1 . f a s t q 0000300 . g z ht A T B J H 2 4 0 A b a 0000320 2 . f a s t q . g z cr nl A T B 0000340 J H 2 4 3 A b a ht A T B J H 0000360 2 4 3 A b a 1 . f a s t q . g 0000400 z ht A T B J H 2 4 3 A b a 2 0000420 . f a s t q . g z cr nl J H 1 2 7 0000440 - b ht J H 1 2 7 - b 1 . f a s 0000460 t q . g z ht J H 1 2 7 - b 2 . 0000500 f a s t q . g z cr nl J H 1 2 9 - 0000520 b ht J H 1 2 9 - b 1 . f a s t 0000540 q . g z ht J H 1 2 9 - b 2 . f 0000560 a s t q . g z cr nl J H 8 5 - b ht 0000600 J H 8 5 - b 1 . f a s t q . g 0000620 z ht J H 8 5 - b 2 . f a s t q 0000640 . g z 0000643

Thanks!!!!!