Closed rsharris closed 2 years ago
What I got from running the web server on a short sequence include a .out file containing this (this is just the first few lines).
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
904 19.4 11.8 0.8 utig4-342 677 904 (39196) + L1MB8 LINE/L1 5926 6178 (0) 1
242 26.9 8.8 4.2 utig4-342 1296 1431 (38669) + MIRb SINE/MIR 121 262 (6) 2
1915 20.1 8.2 2.4 utig4-342 1922 2399 (37701) + MLT1D LTR/ERVL-MaLR 1 505 (0) 3
469 30.1 5.1 0.5 utig4-342 2622 2818 (37282) C LTR33 LTR/ERVL (292) 223 18 4
36 0.0 0.0 0.0 utig4-342 4043 4081 (36019) + (A)n Simple_repeat 1 39 (0) 5
672 26.1 7.4 2.8 utig4-342 4251 4494 (35606) + MIR SINE/MIR 8 262 (0) 6
633 23.4 3.1 0.6 utig4-342 6874 7032 (33068) + MER5A DNA/hAT-Charlie 1 163 (26) 7
533 28.1 15.9 0.4 utig4-342 8939 9170 (30930) + LTR67B LTR/ERVL 344 611 (9) 8
1847 16.0 1.6 0.0 utig4-342 10159 10465 (29635) + AluSx SINE/Alu 1 312 (0) 9
There were other output files but this is the only one that makes any sense as being convertible to BED format. I guess I would need to grab columns 5, 6, and 7 as the BED interval (subtracting one from column 6). Maybe column 10 or 11 as BED column 4. Unlcear what else I might need.
I also tried looking for a repeat masker BED file in the UCSC data page for hg38, but no luck. That would show me what's needed, but I didn't find one there. (I did find a TRF BED file there, which resolved similar questions I would have had for the TRF step).
Sorry for the trouble. I had the description wrong and fixed it now. The file you need is the .out file from RepeatMasker output files and not BED format. The amplicone-build step will automatically grab columns 5, 6, and 7 from the .out file to identify the chromosome locations of repeat regions. Thanks for reporting the issue.
Thumbs up! Thanks!
I'm trying to perform the steps listed under "AmpliCoNE usage with other reference genomes / species". Under step 1, one of the files needed is BED format output from RepeatMasker.
I'm having some trouble to install and run RepeatMasker (that's a different issue). The only description I've found for its output is under "Output / return format" on https://www.repeatmasker.org/webrepeatmaskerhelp.html . (This describes the web-based RepeatMasker server (I guess) but is probably similar to what I would get running it on my own machine.) This says, in part, that "a table annotating the masked sequences as well as a table summarizing the repeat content of the query sequence will be" [produced]. Is one of those the BED file needed for AmpliCoNE? If so, which?