rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
214 stars 48 forks source link

Generate BED12 files #223

Closed elcortegano closed 7 months ago

elcortegano commented 1 year ago

Can the script RM2Bed.py generate BED12 output files?

RepeatMasker was run on the genome of the green algae Chlamydomonas incerta. I think to remember that in the past I had used RM2Bed.py to generate BED12 files from the RepeatMasker out files, however, what I have got now is a BED10 file, and I'm confused this is normal.

Is this the correct BED format? is there a way to generate BED12 files from RepeatMasker output? Thank you.

rmhubley commented 11 months ago

Well, technically this is a "BED6" format as we only provide conformance to the BED fields up to field 6 (orientation - https://genome.ucsc.edu/FAQ/FAQformat.html#format1). Instead of "thickStart", "thickEnd", "itemRGB", and "blockCount" we provide the TE family class, subclass, divergence, and linkage ID (linking fragments of the same insertion). To convert this to a true BED12 would require some work to collapse multiple TE insertion fragment lines into a single line using the blockStarts/blockCounts fields. We have done a bit of this work in the util/rmToTrackHub.pl perl script. With that script you can generate remote trackHubs for the UCSC genome browser from a RepeatMasker resultset.