halelab / GBS-SNP-CROP

GBS SNP Calling Reference Optional Pipeline
GNU General Public License v2.0
31 stars 31 forks source link

CaN GBS-SNP-CROP be used with ION torrent? #19

Closed jshleap closed 5 years ago

jshleap commented 5 years ago

Hi! I was wondering if your pipeline is compatible with ion torrent reads. If not, what is needed to do to make them compatible?

Thanks!

Sergio

halelab commented 5 years ago

Hi Sergio,

Do you have Genotyping-By-Sequencing (GBS) data (fragments) sequenced by Ion Torrent? GBS-SNP-CROP was developed to parse Illumina GBS reads. So, probably the first answer is no! However, I should look in detail to the data set in order to understand how move forward with Ion Torrent data. Could you please, send me a print screen of the fastq reads you have?

Best, Arthur

jshleap commented 5 years ago

Hi Arthur!! thank you very much for the quick reply.

Yes, I have GBS Ion Torrent data, and it looks like this:

@QBLII:01332:11592 TTAAGCGGTCGATTGCAGCACAAGAGGCAGCAGGCTGGACAATGGGCTGTACAGCGCTGTCATGTTGGAAGGCTTCTTCCCTCCCTAAAAGGAAGGTCTATCACTGTAGGCGCTACAAAGGATGAACTTAACTGGACCCCGACGACAGCCGCGATCCGAGATCGGAAGAGCGGGATC + :5;5<<<6<<;;;5:;<;9:9:5:;=6<<=7=;;5;<>7:::0:::::;<;;;<<<;=<;;;6<6<6<7D<6=A06605??4>==D1>5;5:7;;;;>:<///7::5?<<=<?>?4<7;=;>7;;5:5;;<7<<;<.999;;;<<=7<<<<9918;;;;;:5;5995658/799 @QBLII:01332:11595 TAAGGCAACCACGATTGCAGTTGTACAGGGCCTTGGTGAGGCCACATCTGGAGTATTGTGTGCAGTTTTGGTCTCCTAATCTGAGGAAGGACATTCTTGCTACTGAGGGAGTGCAGCGAAGGTTCACCAGATTGATTCCCGGGGGCTACACAGGGCTACCCTGGGCTGGGAGCTGGGCTATCCTGGGCTACCTG + 6;5;599181589997;<;>;79888:<;2;19194:?<<7<6;;>;;<<6<?==>7@=<;;;;;;??1=7<;::29:5::::;;4:4:4:76659:29::;9::667-6::;;<<;;;6<7<7;;;5:::;2;:;5;;2;:::,887764444-37644/00/).4461633378279:::69991965648:

The data comes in single fastq.gz files. My particular GBS was performed with PstI and MsPI restriction enzymes, and each sample is barcoded.

Best,

arthurmelobio commented 5 years ago

Hi Sergio,

Assuming the structure of the nucleotide sequence of your reads is barcode sequence + restriction site + nucleotide sequence, the only thing you should do with your data is add a header complement (1:N:0:). The follow perl script can do that for you.

#!/usr/bin/perl

use strict;
use warnings;

my $input = $ARGV[0];

#open my $IN, '-|', 'gzip', '-dc', $input or die "Can't open file $input!\n";

my @Read;
my @Reads;
my $i = 1;
while(<$IN>) {
    if ($i % 4 != 0) {
        push @Read, $_;
        $i++;
    } else {
        push @Read, $_;
        chomp (@Read);
        push @Reads, [ @Read ];
        @Read = ();
        $i++;
    }
}

for (my $j = 0; $j <= scalar @Reads - 1; $j++) {

    my $new_header = join (" ", "$Reads[$j][0]", "1:N:0:");
    print "$new_header\n$Reads[$j][1]\n$Reads[$j][2]\n$Reads[$j][3]\n";
    next;
}
close $IN;
exit;

So, the read structure require for GBS-SNP-CROP should looks like:

@ Read header ended with 1:N:0:
barcode sequence + restriction site + nucleotide sequence
+
quality characters

Make sure if all your reads has this structure and try. Please, see the GBS-SNP-CROP user manual (https://github.com/halelab/GBS-SNP-CROP/wiki/GBS-SNP-CROP-User-Manual-(v.4.0)) and for any other question, please, use the GBS-SNP-CROP google group page (https://groups.google.com/forum/#!forum/gbs-snp-crop).

Also, let me know about your progress...

Best, Arthur