Closed ncnlll closed 3 years ago
Hi Lorena,
If you made the .csv file with LibreOffice (Calc) it is possible that it automatically adds the quotation marks " " in your text (e.g., "DNA/hAT-Ac","DNA/hAT-Ac","DNA transposon","Transposable Element"), and that could be causing problems for SalmonTE to read it properly... Try removing the " " and see if it works :)
Best Flavia
It worked. Thank you so much Flavia :)
Best Lorena
Hi Hyun-Hwan Jeong, I've tried using SalmonTE on some penguin data from our lab, but i'm having issues with the index step. I created my FASTA file of repeat sequences
EmperorTranscrNew_families.txt
with RepeatModeler and modified the headers as you suggested in "How to build a customized index", and after that I made my own clades_extended.csv like this:
"DNA/hAT-Ac","DNA/hAT-Ac","DNA transposon","Transposable Element" "DNA/Kolobok-H","DNA/Kolobok-H","DNA transposon","Transposable Element" "DNA/PIF-Harbinger","DNA/PIF-Harbinger","DNA transposon","Transposable Element" "LTR/Pao","LTR/Pao","LTR Retrotransposon","Transposable Element" "LTR/Gypsy","LTR/Gypsy","LTR Retrotransposon","Transposable Element" "LTR/ERVL","LTR/ERVL","Endogenous Retrovirus","Transposable Element" "LTR/ERV1","LTR/ERV1","Endogenous Retrovirus","Transposable Element" "LINE/CR1","LINE/CR1","Non-LTR Retrotransposon","Transposable Element" "SINE/MIR","SINE/MIR","Non-LTR Retrotransposon","Transposable Element"
I run the command: python3.6 SalmonTE.py index --input_fasta=/cluster_data/home/genomic/penguins/repeats/OutRepeatModeler/EmperorTranscrNew_families.fa --ref_name=emp --te_only but the output clades.csv file in the reference/emp looks like this: name,class,clade rnd-1_family-19,other,other rnd-1_family-4,other,other rnd-1_family-0,other,other rnd-1_family-1,other,other rnd-1_family-10,other,other rnd-1_family-16,other,other rnd-1_family-8,other,other rnd-1_family-3,other,other rnd-1_family-5,other,other rnd-1_family-7,other,other rnd-1_family-18,other,other rnd-1_family-6,other,other rnd-1_family-17,other,other rnd-1_family-15,other,other rnd-1_family-13,other,other rnd-1_family-12,other,other ...
I don't understand why i have just "other" for all my repeat sequences. Do you have any idea of what could be the problem?
Your help would be very appreciated. Thank you so much
Best, Lorena