oushujun / LTR_FINDER_parallel

A parallel wrapper for LTR_FINDER
MIT License
50 stars 12 forks source link

Use LTR_FINDER_parallel in conjunction with LTR_retriever #1

Closed slbai01 closed 5 years ago

slbai01 commented 5 years ago

Dear Dr. Ou,

I want to use LTR_FINDER_parallel output file to calculate LAI by LTR_retriever.

Your modifications make it very fast compared to the previous single-threaded version. But the format of the output is not the same as before. How do I set parameters or use other scripts to convert to the previous format?

Program    : LTR_FINDER
Version    : 1.07

Predict protein Domains 4.669 second
>Sequence: Contig10_pilon Len:17381955
[1] Contig10_pilon Len:17381955
Location : 71911 - 85359 Len: 13449 Strand:+
Score    : 6 [LTR region similarity:0.952]
Status   : 11111010000
5'-LTR   : 71911 - 73491 Len: 1581
3'-LTR   : 83775 - 85359 Len: 1585
5'-TG    : TG , TG
3'-CA    : CA , CA
TSR      : 71906 - 71910 , 85360 - 85364 [GTGAA]
Sharpness: 0.5,0.486
Strand + :
PPT   : [13/15] 83744 - 83758

Details of exact match pairs:
83791-83876[86] (22) 83899-83935[37] (24) 83960-84012[53] (1) 84014-84033[20] (34) 84068-84097[30] (15) 84113-84145[33]
71927-72012[86] (22) 72035-72071[37] (25) 72097-72149[53] (1) 72151-72170[20] (35) 72206-72235[30] (16) 72252-72284[33]

Details of the LTR alignment(5'-end):
                                       |83775
CTCGAGGACGAGT-AGG----AATTAAGCTTGGGGATGCTGATACGTCTCCAACATATCTATAATTTATGAAGTATTCATG
|  | |||  ||| | |    || ||| | || | |   *|||||||||||||| ||||||||||||||||||||||||||
CATG-GGATAAGTCATGTTATAAGTAATCATGTGAA---TGATACGTCTCCAACGTATCTATAATTTATGAAGTATTCATG
                               *****---|71911
......
......
$head -20 genome.fa.finder.combine.scn
#LTR_FINDER_parallel -seq genome.fa -size 5000000 -time 1500 -try1 1 -harvest_out -threads 28 -cut /share/home/programs/LTR_FINDER_parallel/bin/cut.pl -finder /share/home/programs/LTR_FINDER_parallel/bin/LTR_FINDER.x86_64-1.0.7/ltr_finder
# LTR_FINDER args=-w 2 -C -D 15000 -d 1000 -L 7000 -l 100 -p 20 -M 0.85
# predictions are reported in the following way
# s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr
# where:
# s = starting position
# e = ending position
# l = length
# ret = LTR-retrotransposon
# lLTR = left LTR
# rLTR = right LTR
# sim = similarity
# seq-nr = sequence order
60774 69360 8587 60774 62502 1729 67625 69360 1736 96.1 0 Chr1
88505 98181 9677 88505 89925 1421 96758 98181 1424 97.7 0 Chr1
......
$head -20 ../02.LTR_Finder_old/genome.fa.finder.combine.scn
index   SeqID   Location    LTR len Inserted element len    TSR PBS PPT RT  IN (core)   IN (c-term) RH  Strand  Score   Sharpness   Similarity
[NA]    Chr1    60774-69360 1729,1736   8587    CAAAG   N-N 62504-62518 N-N N-N N-N N-N -6  0.514,0.529 0.961
[NA]    Chr1    88505-98181 1421,1424   9677    ATGTT   N-N 96726-96740 N-N N-N N-N N-N +6  0.486,0.514 0.977
......

Shenglong

oushujun commented 5 years ago

Hi Shenglong,

You can use format 1 (-harvest_out) and give it to LTR_retriever with -inharvest. If you have more than one harvest out file for example, another output from LTRharvest, just aggregate them into one file and specify with -inharvest. Hope it helps!

Best, Shujun

On Sat, Jun 1, 2019, 8:43 PM slbai01 notifications@github.com wrote:

Dear Dr. Ou,

I want to use LTR_FINDER_parallel output file to calculate LAI by LTR_retriever.

Your modifications make it very fast compared to the previous single-threaded version. But the format of the output is not the same as before. How do I set parameters or use other scripts to convert to the previous format?

  • old format (maybe I need this foramt)

Program : LTR_FINDER Version : 1.07

Predict protein Domains 4.669 second

Sequence: Contig10_pilon Len:17381955 [1] Contig10_pilon Len:17381955 Location : 71911 - 85359 Len: 13449 Strand:+ Score : 6 [LTR region similarity:0.952] Status : 11111010000 5'-LTR : 71911 - 73491 Len: 1581 3'-LTR : 83775 - 85359 Len: 1585 5'-TG : TG , TG 3'-CA : CA , CA TSR : 71906 - 71910 , 85360 - 85364 [GTGAA] Sharpness: 0.5,0.486 Strand + : PPT : [13/15] 83744 - 83758

Details of exact match pairs: 83791-83876[86] (22) 83899-83935[37] (24) 83960-84012[53] (1) 84014-84033[20] (34) 84068-84097[30] (15) 84113-84145[33] 71927-72012[86] (22) 72035-72071[37] (25) 72097-72149[53] (1) 72151-72170[20] (35) 72206-72235[30] (16) 72252-72284[33]

Details of the LTR alignment(5'-end): |83775 CTCGAGGACGAGT-AGG----AATTAAGCTTGGGGATGCTGATACGTCTCCAACATATCTATAATTTATGAAGTATTCATG | | ||| ||| | | || ||| | || | | *|||||||||||||| |||||||||||||||||||||||||| CATG-GGATAAGTCATGTTATAAGTAATCATGTGAA---TGATACGTCTCCAACGTATCTATAATTTATGAAGTATTCATG *****---|71911 ...... ......

  • new format1

$head -20 genome.fa.finder.combine.scn

LTR_FINDER_parallel -seq genome.fa -size 5000000 -time 1500 -try1 1 -harvest_out -threads 28 -cut /share/home/programs/LTR_FINDER_parallel/bin/cut.pl -finder /share/home/programs/LTR_FINDER_parallel/bin/LTR_FINDER.x86_64-1.0.7/ltr_finder

LTR_FINDER args=-w 2 -C -D 15000 -d 1000 -L 7000 -l 100 -p 20 -M 0.85

predictions are reported in the following way

s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr

where:

s = starting position

e = ending position

l = length

ret = LTR-retrotransposon

lLTR = left LTR

rLTR = right LTR

sim = similarity

seq-nr = sequence order

60774 69360 8587 60774 62502 1729 67625 69360 1736 96.1 0 Chr1 88505 98181 9677 88505 89925 1421 96758 98181 1424 97.7 0 Chr1 ......

  • new format2

$head -20 ../02.LTR_Finder_old/genome.fa.finder.combine.scn index SeqID Location LTR len Inserted element len TSR PBS PPT RT IN (core) IN (c-term) RH Strand Score Sharpness Similarity [NA] Chr1 60774-69360 1729,1736 8587 CAAAG N-N 62504-62518 N-N N-N N-N N-N -6 0.514,0.529 0.961 [NA] Chr1 88505-98181 1421,1424 9677 ATGTT N-N 96726-96740 N-N N-N N-N N-N +6 0.486,0.514 0.977 ......

Shenglong

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_FINDER_parallel/issues/1?email_source=notifications&email_token=ABNX4NASROHYYZQPPENZJWTPYMQTHA5CNFSM4HSBU2T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXD4MYQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNX4NG2M7HLX2G7SLF5RV3PYMQTHANCNFSM4HSBU2TQ .