statonlab / tripal_ssr

0 stars 0 forks source link

input formats #12

Closed bradfordcondon closed 6 years ago

bradfordcondon commented 6 years ago

perl scripts generate:

report.txt file with the below as an example:

SSR ID  motif   number of repeats   start position  end position
5211846_ssr116  COMPOUND    COMPOUND    116 150
803337_ssr169   ata 8   169 193
1308500_ssr104  ga  14  104 132
1458295_ssr99   ac  6   99  111
5919725_ssr10   ct  12  10  34
3802953_ssr294  at  8   294 310
514915_ssr205   COMPOUND    COMPOUND    205 231
1519454_ssr86   aata    4   86  102
4567753_ssr39   atta    4   39  55

di, tri, tetra primer reports with the below output:

SSR ID  motif   number of repeats   start position  end position    forward primer  reverse primer  forward Tm  reverse Tm  product size
3935838_ssr85   taat    4   85  101 AGATTTAGGCCCGATTTAGGACC GGGTAAGTCAGTATTAATTCTGTGG   59.927  57.212  140
3830623_ssr266  atgt    4   266 282 CACCATATCAACGCAGCATGG   GGTTCTTATTTCTCATGGTAAGAGGG  60.002  59.232  227
1923834_ssr120  ataa    5   120 140 AGACGAAGCATTGTATTCACCG  TAGCCTCCTCGAGACTCTTCC   59.070  60.134  228
3623108_ssr119  taat    4   119 135 ATTTGGGCCAGATTTGGGACC   ATTGACGAATGTCCGGTAACC   60.901  58.375  218
1655230_ssr117  attc    4   117 133 ATGACAACAACCCTGGACTGG   AATTAACGAGTATCCCGTAACC  60.203  55.727  152
4200907_ssr98   atta    4   98  114 ATTCATGCAACCTGTTCCTCG   AATTCCCATGTCCATCAACCC   58.913  58.249  120
3894986_ssr27   taat    4   27  43  AATTCCCATGTCTACCAACCC   TCCTCATAACGTGGTAAATAAGGG    57.618  58.079  105
3027073_ssr60   aaat    4   60  76  CTTCCGTTGGGCTTCAATACC   AGCCTAAATCGATAAGGCTGGG  59.257  59.961  104
431021_ssr154   tatg    4   154 170 GCGTGAGTTCATGTTCTACCG   GCTTTCATATGTGCTGATCGGC  59.346  60.350  245

Finally the bulk loader accepted the following:

feature id SSR ID  motif   number of repeats       start position  end position    forward primer  reverse primer  forward Tm      reverse Tm      product size

which is just the primer file, with the feature ID appended! OK that seems easy enough then. Looking at my input, i'd bess that 3935838_ssr85 is actualy featureid_ssr_id, ie 3935838 and ssr85?

bradfordcondon commented 6 years ago

Spreadsheet for sample input at

https://docs.google.com/spreadsheets/d/17XxUdxV7-3hBmwnDr-dUjQ40sw2i6gQLC4sbyHf9r20/edit?usp=sharing

bradfordcondon commented 6 years ago
header chado destination
feature_id parent feature. Needs to be linked in feature_relationship
ssr_id new ssr feature. needs to be UPDATED if it exists, CREATED if it doesnt
motif SSR feature prop
number of repeats SSR featuere prop
start position this really should go in featureloc, not featureprop. BUT not as a map, is that possible?
end position this really should go in featureloc, not featureprop BUT not as a map, is that possible?
forward primer SSR featureprop
reverse primer SSR featureprop
forward tm
reverse tm
product size