vlothec / TRASH

RepeatIdentifier
MIT License
50 stars 3 forks source link

HOR run details #4

Closed HaoWangLYL closed 1 year ago

HaoWangLYL commented 1 year ago

hello
can you apply an example_run order? thank you

vlothec commented 1 year ago

Hi, what do you mean by applying an example_run order? If you'd like to re-create the example run, you can use a command {TRASH_dir}/TRASH_run.sh {TRASH_dir}/example_run/CP068268_39050443_39150442.fa --o {output_dir}

HaoWangLYL commented 1 year ago

i want to know how to use the HOR module, because when i use this pipline in the given order, it cant product the result and png about HOR.

vlothec commented 1 year ago

The HOR module requires repeat classification, so only one repeat family is analysed. The first step is to add a sequence template information that will be used for this classification. For example a file called "sequence_template.csv" that looks like:

name,length,seq
CEN178,178,AGTATAAGAACTTAAACCGCAACCGATCTTAAAAGCCTAAGTAGTGTTTCCTTGTTAGAAGACACAAAGCCAAAGACTCATATGGACTTTGGCTACACCATGAAAGCTTTGAGAAGCAAGAAGAAGGTTGGTTAGTGTTTTGGAGTCGAATATGACTTGATGTCATGTGTATGATTG

It can be used to classify Arabidopsis thaliana CEN178 repeats. In the output, these will have a "CEN178" class assigned. Add "--seqt {dir}/sequence_template.csv" to the command to use them. With that, CEN178 repeats are available for the HOR analysis by adding "--horclass CEN178" to the command. A full command that will include a HOR run looks like: {TRASH_dir}/TRASH_run.sh {fasta_dir}/xxx.fa --o {output_dir} --seqt {dir}/sequence_template.csv --horclass NNN

HaoWangLYL commented 1 year ago

Why in my results all the calss is NA? 

vlothec commented 1 year ago

Why in my results all the calss is NA?  635169288 @.***  

See the comment above. Without appropriate sequence templates, no class will be assigned (NA).

HaoWangLYL commented 1 year ago

yes, i got it.thank you