hyunhwan-jeong / SalmonTE

SalmonTE is an ultra-Fast and Scalable Quantification Pipeline of Transpose Element (TE) Abundances
GNU General Public License v3.0
81 stars 23 forks source link

sequence for reference #73

Closed wjyzidane closed 1 year ago

wjyzidane commented 1 year ago

I am trying to pull out the sequence for L1HS from SalmonTE reference to see if it contains the full length of ORF. May I know how to approach it?

Thanks!

hyunhwan-jeong commented 1 year ago

This was the sequence I used for the index

ctcgctgattgctagcacagcagtctgagatcaaactgcaaggc
ggcaacgaggctgggggaggggcgcccgccattgcccaggcttgcttaggtaaacaaagcagccgggaag
ctcgaactgggtggagcccaccacagctcaaggaggcctgcctgcctctgtaggctccacctctgggggc
agggcacagacaaacaaaaagacagcagtaacctctgcagacttaagtgtccctgtctgacagctttgaa
gagagcagtggttctcccagcacgcagctggagatctgagaacgggcagactgcctcctcaagtgggtcc
ctgacccctgacccccgagcagcctaactgggaggcaccccccagcaggggcacactgacacctcacacg
gcagggtattccaacagacctgcagctgagggtcctgtctgttagaaggaaaactaacaaccagaaagga
catctacaccgaaaacccatctgtacatcaccatcatcaaagaccaaaagtagataaaaccacaaagatg
gggaaaaaacagaacagaaaaactggaaactctaaaacgcagagcgcctctcctcctccaaaggaacgca
gttcctcaccagcaacagaacaaagctggatggagaatgattttgacgagctgagagaagaaggcttcag
acgatcaaattactctgagctacgggaggacattcaaaccaaaggcaaagaagttgaaaactttgaaaaa
aatttagaagaatgtataactagaataaccaatacagagaagtgcttaaaggagctgatggagctgaaaa
ccaaggctcgagaactacgtgaagaatgcagaagcctcaggagccgatgcgatcaactggaagaaagggt
atcagcaatggaagatgaaatgaatgaaatgaagcgagaagggaagtttagagaaaaaagaataaaaaga
aatgagcaaagcctccaagaaatatgggactatgtgaaaagaccaaatctacgtctgattggtgtacctg
aaagtgatgtggagaatggaaccaagttggaaaacactctgcaggatattatccaggagaacttccccaa
tctagcaaggcaggccaacgttcagattcaggaaatacagagaacgccacaaagatactcctcgagaaga
gcaactccaagacacataattgtcagattcaccaaagttgaaatgaaggaaaaaatgttaagggcagcca
gagagaaaggtcgggttaccctcaaaggaaagcccatcagactaacagcggatctctcggcagaaaccct
acaagccagaagagagtgggggccaatattcaacattcttaaagaaaagaattttcaacccagaatttca
tatccagccaaactaagcttcataagtgaaggagaaataaaatactttatagacaagcaaatgctgagag
attttgtcaccaccaggcctgccctaaaagagctcctgaaggaagcgctaaacatggaaaggaacaaccg
gtaccagccgctgcaaaatcatgccaaaatgtaaagaccatcgagactaggaagaaactgcatcaactaa
tgagcaaaatcaccagctaacatcataatgacaggatcaaattcacacataacaatattaactttaaata
taaatggactaaattctgcaattaaaagacacagactggcaagttggataaagagtcaagacccatcagt
gtgctgtattcaggaaacccatctcacgtgcagagacacacataggctcaaaataaaaggatggaggaag
atctaccaagccaatggaaaacaaaaaaaggcaggggttgcaatcctagtctctgataaaacagacttta
aaccaacaaagatcaaaagagacaaagaaggccattacataatggtaaagggatcaattcaacaagagga
gctaactatcctaaatatttatgcacccaatacaggagcacccagattcataaagcaagtcctcagtgac
ctacaaagagacttagactcccacacattaataatgggagactttaacaccccactgtcaacattagaca
gatcaacgagacagaaagtcaacaaggatacccaggaattgaactcagctctgcaccaagcagacctaat
agacatctacagaactctccaccccaaatcaacagaatatacatttttttcagcaccacaccacacctat
tccaaaattgaccacatagttggaagtaaagctctcctcagcaaatgtaaaagaacagaaattataacaa
actatctctcagaccacagtgcaatcaaactagaactcaggattaagaatctcactcaaagccgctcaac
tacatggaaactgaacaacctgctcctgaatgactactgggtacataacgaaatgaaggcagaaataaag
atgttctttgaaaccaacgagaacaaagacaccacataccagaatctctgggacgcattcaaagcagtgt
gtagagggaaatttatagcactaaatgcctacaagagaaagcaggaaagatccaaaattgacaccctaac
atcacaattaaaagaactagaaaagcaagagcaaacacattcaaaagctagcagaaggcaagaaataact
aaaatcagagcagaactgaaggaaatagagacacaaaaaacccttcaaaaaatcaatgaatccaggagct
ggttttttgaaaggatcaacaaaattgatagaccgctagcaagactaataaagaaaaaaagagagaagaa
tcaaatagacacaataaaaaatgataaaggggatatcaccaccgatcccacagaaatacaaactaccatc
agagaatactacaaacacctctacgcaaataaactagaaaatctagaagaaatggatacattcctcgaca
catacactctcccaagactaaaccaggaagaagttgaatctctgaatagaccaataacaggctctgaaat
tgtggcaataatcaatagtttaccaaccaaaaagagtccaggaccagatggattcacagccgaattctac
cagaggtacaaggaggaactggtaccattccttctgaaactattccaatcaatagaaaaagagggaatcc
tccctaactcattttatgaggccagcatcattctgataccaaagccgggcagagacacaaccaaaaaaga
gaattttagaccaatatccttgatgaacattgatgcaaaaatcctcaataaaatactggcaaaccgaatc
cagcagcacatcaaaaagcttatccaccatgatcaagtgggcttcatccctgggatgcaaggctggttca
atatacgcaaatcaataaatgtaatccagcatataaacagagccaaagacaaaaaccacatgattatctc
aatagatgcagaaaaagcctttgacaaaattcaacaacccttcatgctaaaaactctcaataaattaggt
attgatgggacgtatttcaaaataataagagctatctatgacaaacccacagccaatatcatactgaatg
ggcaaaaactggaagcattccctttgaaaactggcacaagacagggatgccctctctcaccgctcctatt
caacatagtgttggaagttctggccagggcaatcaggcaggagaaggaaataaagggtattcaattagga
aaagaggaagtcaaattgtccctgtttgcagacgacatgattgtttatctagaaaaccccatcgtctcag
cccaaaatctccttaagctgataagcaacttcagcaaagtctcaggatacaaaatcaatgtacaaaaatc
acaagcattcttatacaccaacaacagacaaacagagagccaaatcatgggtgaactcccattcacaatt
gcttcaaagagaataaaatacctaggaatccaacttacaagggatgtgaaggacctcttcaaggagaact
acaaaccactgctcaaggaaataaaagaggacacaaacaaatggaagaacattccatgctcatgggtagg
aagaatcaatatcgtgaaaatggccatactgcccaaggtaatttacagattcaatgccatccccatcaag
ctaccaatgactttcttcacagaattggaaaaaactactttaaagttcatatggaaccaaaaaagagccc
gcatcgccaagtcaatcctaagccaaaagaacaaagctggaggcatcacactacctgacttcaaactata
ctacaaggctacagtaaccaaaacagcatggtactggtaccaaaacagagatatagatcaatggaacaga
acagagccctcagaaataatgccgcatatctacaactatctgatctttgacaaacctgagaaaaacaagc
aatggggaaaggattccctatttaataaatggtgctgggaaaactggctagccatatgtagaaagctgaa
actggatcccttccttacaccttatacaaaaatcaattcaagatggattaaagatttaaacgttagacct
aaaaccataaaaaccctagaagaaaacctaggcattaccattcaggacataggcgtgggcaaggacttca
tgtccaaaacaccaaaagcaatggcaacaaaagccaaaattgacaaatgggatctaattaaactaaagag
cttctgcacagcaaaagaaactaccatcagagtgaacaggcaacctacaacatgggagaaaattttcgca
acctactcatctgacaaagggctaatatccagaatctacaatgaactcaaacaaatttacaagaaaaaaa
caaacaaccccatcaaaaagtgggcgaaggacatgaacagacacttctcaaaagaagacatttatgcagc
caaaaaacacatgaagaaatgctcatcatcactggccatcagagaaatgcaaatcaaaaccactatgaga
tatcatctcacaccagttagaatggcaatcattaaaaagtcaggaaacaacaggtgctggagaggatgtg
gagaaataggaacacttttacactgttggtgggactgtaaactagttcaaccattgtggaagtcagtgtg
gcgattcctcagggatctagaactagaaataccatttgacccagccatcccattactgggtatataccca
aaggactataaatcatgctgctataaagacacatgcacacgtatgtttattgcggcactattcacaatag
caaagacttggaaccaacccaaatgtccaacaatgatagactggattaagaaaatgtggcacatatacac
catggaatactatgcagccataaaaaatgatgagttcatatcctttgtagggacatggatgaaattggaa
accatcattctcagtaaactatcgcaagaacaaaaaaccaaacaccgcatattctcactcataggtggga
attgaacaatgagatcacatggacacaggaaggggaatatcacactctggggactgtggtggggtcgggg
gaggggggagggatagcattgggagatatacctaatgctagatgacacgttagtgggtgcagcgcaccag
catggcacatgtatacatatgtaactaacctgcacaatgtgcacatgtaccctaaaacttagagtataat
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaataaataaaaa

You can find further information at https://github.com/hyunhwan-jeong/SalmonTE/blob/main/scripts/hs_origin.fa.

Thank you,

Hyun-Hwan Jeong