tseemann / nullarbor

:floppy_disk: :page_with_curl: "Reads to report" for public health and clinical microbiology
GNU General Public License v2.0
134 stars 37 forks source link

Making Txt file Tab for the run #248

Closed foamy1881 closed 4 years ago

foamy1881 commented 4 years ago

I'm new and a PhD student analyzing genomics data. i manage to setup nullarbor and it said passed (saying i deserve a medal). However upon testing it on a small sub sample set it return with an error message regarding the sample file name. If not mistaken the txt file to run nullabor2 is ID,R1,R2. It can't read sequence of R2. Can anyone please guide me?

nullarbor.pl --name MtbNull2 --mlst saureus -ref ../../ReferenceGenome/NC000962.gbk --input null2mtb10.txt --outdir MtbNull2 [07:13:57] Hello yfong [07:13:57] This is nullarbor.pl 2.0.20181010 [07:13:57] Send complaints to Torsten Seemann [07:13:57] Scanning --ref for problematic sequence IDs... [07:13:57] Using reference genome: /home/yfong/Documents/ReferenceGenome/NC000962.gbk '07:13:57] ERROR: Isolate 'M001' - can not read sequence #2 of 1 files: '/home/yfong/Documents/SequenceFiles/M001_R2.fastq.gz

name from sample file txt

M001 /home/yfong/Documents/SequenceFiles/M001_R1.fastq.gz /home/yfong/Documents/SequenceFiles/M001_R2.fastq.gz M002 /home/yfong/Documents/SequenceFiles/M002_R1.fastq.gz /home/yfong/Documents/SequenceFiles/M002_R2.fastq.gz M003 /home/yfong/Documents/SequenceFiles/M003_R1.fastq.gz /home/yfong/Documents/SequenceFiles/M003_R2.fastq.gz

tseemann commented 4 years ago

The format is ID <tab> R1 <tab> R2 (no spaces, must be a TAB character). And the file must be in Linux text format - ending with a single nl (newline), not DOS or MacOS format. Type od -a input-file.txt and look for ht (tab) and nl (good) or cr (bad) letters. Type dos2unix or mac2unix input-file.txt to fix the line endings.

UPDATE: looking again at your error, i think the line ending is the problem. Did you create it in Excel? Or on Windows or Mac?

foamy1881 commented 4 years ago

Yes i used the excel to create the txt output. You are right it is an issue with the TAB character and I fixed it as per suggestion using dos2unix. Is running now:) woohoo.. I'm testing on 10 samples before proceeding to 600+ isolates. Thanks!