statonlab / tripal_ssr

0 stars 0 forks source link

dealing with alternative files (confirmed, polymorphic) #51

Closed bradfordcondon closed 6 years ago

bradfordcondon commented 6 years ago
bradfordcondon commented 6 years ago

P serotina L styraficlua F pennsylvanica F americana A saccharum additional pubs:

Noakes AG, Best T, Staton ME, Koch J, Romero-Severson J. Cross amplification of 15 EST-SSR markers in the genus Fraxinus. Conservation Genetics Resources. 2014 Dec 1;6(4):969-70.

Khodwekar S, Staton M, Coggeshall MV, Carlson JE, Gailing O. Nuclear microsatellite markers for population genetic studies in sugar maple (Acer saccharum Marsh.). Annals of Forest Research. 2015 Apr 15;58(2):193-204.

p serotina

Of the 96 tested SSRs, 48 successfully amplified and 26 were found to be polymorphic.

https://www.hardwoodgenomics.org/sites/default/files/gSSRs/CarlsonLab_BlackCherry_TestedSSRs.xlsx

l styraciflua

Of the 96 tested SSRs, 44 successfully amplified and 28 were found to be polymorphic.

https://www.hardwoodgenomics.org/sites/default/files/gSSRs/CarlsonLab_Sweetgum_TestedSSRs.xlsx

F. pennsylvanica and F. americana

Supplemental file is a pdf table with the following columns

EST-SSR marker (Fp12353), accession number (KJ626347), forward sequence, reverse sequence, size range, heterozygosity stats, and motif.

Noakes AG, Best T, Staton ME, Koch J, Romero-Severson J. Cross amplification of 15 EST-SSR markers in the genus Fraxinus. Conservation Genetics Resources. 2014 Dec 1;6(4):969-70.

Acer saccharum

A subset of the predicted SSRs have been screened for polymorphism and published in a population diversity study:

Khodwekar S, Staton M, Coggeshall MV, Carlson JE, Gailing O. Nuclear microsatellite markers for population genetic studies in sugar maple (Acer saccharum Marsh.). Annals of Forest Research. 2015 Apr 15;58(2):193-204.

bradfordcondon commented 6 years ago
select count(*) from chado.featureprop fp INNER JOIN chado.cvterm cvt ON cvt.cvterm_id = fp.type_id WHERE cvt.name ='tripal_ssr_forward_primer';

8566 on live, 8367 on dev.

Assuming that all SSRs loaded for all species (check) this means that htere ARE confirmed SSRs that were loaded on live that I just cant find the files and/or information for.

bradfordcondon commented 6 years ago
select o.common_name, count(f.feature_id) from chado.featureprop fp INNER JOIN chado.cvterm cvt ON cvt.cvterm_id = fp.type_id INNER JOIN chado.feature f ON f.feature_id = fp.feature_id INNER JOIN chado.organism o ON o.organism_id = f.organism_id WHERE cvt.name ='tripal_ssr_forward_primer' Group by o.common_name;
American Beech  28
American Chestnut   737
American Sweetgum   2070
Blackgum    844
Black Walnut    925
Green Ash   482
Honeylocust 327
Northern Red Oak    494
Red Alder   232
Sugar Maple 1477
Tulip Poplar    482
White Alder 188
White Oak   81

and on dev

American Beech  28
American Chestnut   773
American Sweetgum   2147
Blackgum    854
Black Walnut    939
Green Ash   484
Honeylocust 327
Northern Red Oak    497
Red Alder   239
Sugar Maple 1514
Tulip Poplar    489
White Alder 191
White Oak   84
bradfordcondon commented 6 years ago

using wc -l

black gum: 854 Black Walnut: 939 white alder : 191

So the DEV matches the actual files.

either the original pipeline made mistakes loading the files... or the list of features was ammended on live (ie some primers were removed...)

bradfordcondon commented 6 years ago

here's the carlson data for sweetgum

Seq_name Lab Specific Marker Name Motif Forward Primer Reverse Primer Predicted Amplicon Size Amplification? Size on gel Likely to be polymorphic?
HWI-ST609:156:C0NHEACXX:1:2209:7701:3126 3126 AT GGGGTAAAATAGAAAATTA TCTAATGCGATTAAATCTA 178 NO N/A
HWI-ST609:156:C0NHEACXX:1:1305:12569:4331 4331 tc CACTACTCTTTCTTTAACCAGACG TCCTCTGTTCCTGTAATTGGC 150 YES 150-200 Yes
HWI-ST609:156:C0NHEACXX:1:2309:21286:5769 5769 ag TTGCTCCAAGCTTTGTCTCC CATCATCACAATCATTCTCCC 199 YES 170-200 Yes
HWI-ST609:156:C0NHEACXX:1:1313:2176:5884 5884 ct CAATGCATAAGATACAACTCCC ATGAGAGGAGGGAAAGGAGG 197 NO N/A
HWI-ST609:156:C0NHEACXX:1:1208:13723:6315 6315 ga TGGTACTTGGTAGGTCTAG CTCTCTTTTACAGAGTCGT 155 NO N/A
HWI-ST609:156:C0NHEACXX:1:1311:9814:6626 6626 ta TTATTGCAACAATGCTTCCC AGGTATACGTCACCATGAAACG 196 YES 170-200
HWI-ST609:156:C0NHEACXX:1:2306:2336:8086 8086 AG ATGATTTTAAATTACCCTC TTTATTATTAGGTGCACAC 130 NO N/A
HWI-ST609:156:C0NHEACXX:1:2103:5097:8342 8342 ct TCTTACCAGCTGCTGTTTGC GGGATGTAGTAAGGCCCAGC 171 YES 170-200 Yes
HWI-ST609:156:C0NHEACXX:1:1211:8437:8685 8685 ct TTTTCATAACAAGAAGTTG AGTTGTTAGAGAATTGGAG 155 NO N/A
HWI-ST609:156:C0NHEACXX:1:1211:21123:8752 8752 AT AGATGGGTCTAGAAAATTA AAAGGCTGAAGTTAGTAAT 155 NO N/A
HWI-ST609:156:C0NHEACXX:1:1113:3078:8808 8808 at GGAGATCCTTGGCTATGTGC TAGCCACCCATTCATAACCG 150 YES 150-200 Yes
HWI-ST609:156:C0NHEACXX:1:1308:14461:9551 9551 ct TGCAATAGCTGTCAATAACTCC GAGCGAGCATGACATCACC 150 YES 150-200
HWI-ST609:156:C0NHEACXX:1:1206:11292:10654 10654 ga ATGGCTCAAGGGTTTCACG GCATGCCCTAGTCAAAGTGG 200 NO N/A
HWI-ST609:156:C0NHEACXX:1:2209:2517:11530 11530 at GCTTTGATGTATTTGTTGGG GGGTGTGTGTCTCTTATCAAGC 171 NO N/A

our SSR file looks like this:

Liquidambar_styraciflua_01052015_comp49593_c4_seq13_ssr327 tc 9 327 345 GAAGTTGCCAAAGTCCACGC TCTCAACCTCACATGTCAGTCC 60.318 59.700 20

so no way that the scaffold names would line up.

Furthermore, the primers do not exist in the database/are not in the ssr input files. There is therefore no way to "reverse engineer" which SSR the confirmed ones belong to. At this point im pretty sure they are a totally different pool of primers. This means our module cannot load them.

bradfordcondon commented 6 years ago

so to summarize: there are no confirmed polymorphic files. They call SNPs from reads, and havent been trnaslated to features, and wont be. can close.