Open dridk opened 8 years ago
Hi, good question. You need a string in the fasta header that includes: ';barcodelabel=SAMPLEID;’. For example:
M01918:213:000000000-AFC1C:1:1101:15775:1331 1:N:0:0;barcode=TAAATATACCCT;barcodelabel=cp83; TACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGTAAGACAGGTGTGAAATCCCCGGGCTTAACCTGGGAATTGCCTTTGGGACTGCATGGCTAGAGTGTGTCAGAGGGGGGTAGAATTCCAAGTGTAGCAGTGTAATGCGTAGATATGTGGGGGAATACCGATGGCGGAGGCAGCCCCCTGGGCAGATACTGACGCTCAGGCACGAAAGCCTGGGGAGCAAACA
where ‘cp83’ is the sample ID.
This formatting comes from the prep_fastq_for_uparse_paired.py
script, fyi.
Jon
On Aug 22, 2016, at 1:25 PM, sacha schutz <notifications@github.com mailto:notifications@github.com> wrote:
I trying to do a simple test , but I don't understand how fasta header are proccess. For exemple, I have One sample test.fa with the following reads :
A_sample1 AGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACA A_sample2 ATGGTCGTATATATATGGTCGTATATATATGGTCGTATATATATGGTCGTATATATATGGTCGTATATATATGGTCGTATATAT A_sample3 ATGGTCGTGTCGTGTCGTGTCGTATATATATCGGTCGTGTCGTGTCGTGTCGTGTCGTATGTCGTGTCGTGTCGTGTCGTATAT A_sample4 ATGGTCGTGTCGTGTCGTGTCGTATATATATCGGTCGTGTCGTGTCGTGTCGTGTCGTATGTCGTGTCGTGTCGTGTCGTATAT A_sample5 ATACGTGTATGATATGCGGTGTAATACGTGTATGATATGCGGTGTAATACGTGTATGATATGCGGTGTAATACGTGTATGATAT A_sample6 ATACGTGTATGATATGCGGTGTAATACGTGTATGATATGCGGTGTAATACGTGTATGATATGCGGTGTAATACGTGTATGATAT A_sample7 AGAACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACA A_sample8 AGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACA A_sample9 AGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACAAGATACA A_sample10 ATGGTCGTGTCGTGTCGTGTCGTATATATATCGGTCGTGTCGTGTCGTGTCGTGTCGTATGTCGTGTCGTGTCGTGTCGTATAT A_sample11 ATGGTCGTGTCGTGTCGTGTCGTATATATATCGGTCGTGTCGTGTCGTGTCGTGTCGTATGTCGTGTCGTGTCGTGTCGTATAT A_sample12 ATGGTCGTGTCGTGTCGTGTCGTATATATATCGGTCGTGTCGTGTCGTGTCGTGTCGTATGTCGTGTCGTGTCGT I cluster them using :
vsearch --cluster_fast test.fa --id 0.97 --centroids centroids.fa --sizeout --uc test.uc --relabel_sha1 --relabel_keep
Now I want to convert them to biom using your script :
uctobiom -i test.uc -o test.biom
I get the following error :
Error in uc file formating. Check for spaces in sample IDs and to make sure there is a semicolon after sample IDs. First line with issue: S 0 84 * * * * * A1 * 100.0% Writing table... I thinks fasta header should keep a rule, but I don't know how... Could you make me a simple exemple to make me understand ? Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/leffj/helper-code-for-uparse/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/ACqxj9IVFp30JCvEeNF83DmzOhtOu3l7ks5qiduWgaJpZM4JqGxV.
Hi,
I have an .uc file that is in the format below:
H 205 339 98.5 + 0 0 339M B3::M02542:85:000000000-BWJ73:1:1102:22965:2274 OTU_206 H 547 339 98.5 + 0 0 339M B13::M02542:85:000000000-BWJ73:1:2116:22473:4007 OTU_548 H 436 339 97.6 + 0 0 D338M B14::M02542:85:000000000-BWJ73:1:1116:19896:20825 OTU_437 H 127 339 98.8 + 0 0 339M B9::M02542:85:000000000-BWJ73:1:1118:22070:17406 OTU_128 H 200 337 99.1 + 0 0 I337M B3::M02542:85:000000000-BWJ73:1:1116:13763:3215 OTU_201 H 174 339 98.8 + 0 0 339M B15::M02542:85:000000000-BWJ73:1:1115:12758:8719 OTU_175 N * * * . * * * B6::M02542:85:000000000-BWJ73:1:1117:9645:18835 * H 137 328 99.1 + 0 0 328M11I B12::M02542:85:000000000-BWJ73:1:2103:20919:8080 OTU_138 H 443 335 100.0 + 0 0 335M4I B12::M02542:85:000000000-BWJ73:1:1103:27262:12348 OTU_444
I get the following error:
Error in uc file formating. Check for spaces in sample IDs and to make sure there is a semicolon after sample IDs. First line with issue: H 349 338 99.4 + 0 0 261MI77M B1::M02542:85:000000000-BWJ73:1:1OTU_35022:9749 1:N:0:TAGCTT
I'm finding it hard to convert the .uc file to otu table txt file. Would you be please able to modify the script, create_otu_table_from_uc_file.py for user-specific needs?
Any help will be much appreciated, thanks in advance.
I trying to do a simple test , but I don't understand how fasta header are proccess. For exemple, I have One sample test.fa with the following reads :
I cluster them using :
vsearch --cluster_fast test.fa --id 0.97 --centroids centroids.fa --sizeout --uc test.uc --relabel_sha1 --relabel_keep
Now I want to convert them to biom using your script :
python create_otu_table_from_uc_file.py -i test.uc -o test.biom
I get the following error :
I thinks fasta header should keep a rule, but I don't know how... Could you make me a simple exemple to make me understand ? Thanks