Closed jshleap closed 5 years ago
Hi Sergio,
Do you have Genotyping-By-Sequencing (GBS) data (fragments) sequenced by Ion Torrent? GBS-SNP-CROP was developed to parse Illumina GBS reads. So, probably the first answer is no! However, I should look in detail to the data set in order to understand how move forward with Ion Torrent data. Could you please, send me a print screen of the fastq reads you have?
Best, Arthur
Hi Arthur!! thank you very much for the quick reply.
Yes, I have GBS Ion Torrent data, and it looks like this:
@QBLII:01332:11592 TTAAGCGGTCGATTGCAGCACAAGAGGCAGCAGGCTGGACAATGGGCTGTACAGCGCTGTCATGTTGGAAGGCTTCTTCCCTCCCTAAAAGGAAGGTCTATCACTGTAGGCGCTACAAAGGATGAACTTAACTGGACCCCGACGACAGCCGCGATCCGAGATCGGAAGAGCGGGATC + :5;5<<<6<<;;;5:;<;9:9:5:;=6<<
=7=;;5;<>7:::0:::::;<;;;<<<;=<;;;6<6<6<7D<6=A06605??4>==D1>5;5:7;;;;>:<///7::5?<<=<?>?4<7;=;>7;;5:5;;<7<<;<.999;;;<<=7<<<<9918;;;;;:5;5995658/799 @QBLII:01332:11595 TAAGGCAACCACGATTGCAGTTGTACAGGGCCTTGGTGAGGCCACATCTGGAGTATTGTGTGCAGTTTTGGTCTCCTAATCTGAGGAAGGACATTCTTGCTACTGAGGGAGTGCAGCGAAGGTTCACCAGATTGATTCCCGGGGGCTACACAGGGCTACCCTGGGCTGGGAGCTGGGCTATCCTGGGCTACCTG + 6;5;599181589997;<;>;79888:<;2;19194:?<<7<6;;>;;<<6<?==>7@=<;;;;;;??1=7<;::29:5::::;;4:4:4:76659:29::;9::667-6::;;<<;;;6<7<7;;;5:::;2;:;5;;2;:::,887764444-37644/00/).4461633378279:::69991965648:
The data comes in single fastq.gz files. My particular GBS was performed with PstI and MsPI restriction enzymes, and each sample is barcoded.
Best,
Hi Sergio,
Assuming the structure of the nucleotide sequence of your reads is barcode sequence + restriction site + nucleotide sequence, the only thing you should do with your data is add a header complement (1:N:0:). The follow perl script can do that for you.
#!/usr/bin/perl
use strict;
use warnings;
my $input = $ARGV[0];
#open my $IN, '-|', 'gzip', '-dc', $input or die "Can't open file $input!\n";
my @Read;
my @Reads;
my $i = 1;
while(<$IN>) {
if ($i % 4 != 0) {
push @Read, $_;
$i++;
} else {
push @Read, $_;
chomp (@Read);
push @Reads, [ @Read ];
@Read = ();
$i++;
}
}
for (my $j = 0; $j <= scalar @Reads - 1; $j++) {
my $new_header = join (" ", "$Reads[$j][0]", "1:N:0:");
print "$new_header\n$Reads[$j][1]\n$Reads[$j][2]\n$Reads[$j][3]\n";
next;
}
close $IN;
exit;
So, the read structure require for GBS-SNP-CROP should looks like:
@ Read header ended with 1:N:0:
barcode sequence + restriction site + nucleotide sequence
+
quality characters
Make sure if all your reads has this structure and try. Please, see the GBS-SNP-CROP user manual (https://github.com/halelab/GBS-SNP-CROP/wiki/GBS-SNP-CROP-User-Manual-(v.4.0)) and for any other question, please, use the GBS-SNP-CROP google group page (https://groups.google.com/forum/#!forum/gbs-snp-crop).
Also, let me know about your progress...
Best, Arthur
Hi! I was wondering if your pipeline is compatible with ion torrent reads. If not, what is needed to do to make them compatible?
Thanks!
Sergio