Closed sjteresi closed 3 years ago
Shujun has a system for resolving duplicate base pairs within EDTA, it is just a Perl script. It will require a bed format of TEs. After looking at the script, it is unclear how the annotation naming scheme is kept.
Spoke with Adrian and Pat about possible fixes. I had a short presentation with Pat and Adrian about the problem and potential fixes.
@teresi please take a look at this powerpoint just to see the current state of affairs and re-familiarize yourself with the problem.
I am getting suggestions from Pat and Adrian, will keep you updated.
@teresi am ready to merge with master, however given the changes ongoing in genedata_cache
branch. I would like to integrate that one first (possibly into this branch?). I need the new paths for the verify
commands from genedata_cache and I will need to update the True/False usage of a command-line argument in this branch (I will borrow some code from genedata_cache
for that).
So given that genedata_cache
contains some newer features that I would like to incorporate into this (and I will have to probably add one more small commit on top of that), how should I best handle adding this body of code to master?
Reminder that the ulimit needs to be set, possibly discuss with Michael about how to have the user that themselves. I have to set it in every terminal session.
Sara Anderson in her maize methylome paper modified a TE annotation using RTrackLayer in R so that each base of the genome could only be assigned to a single TE.
Investigate RTrackLayer and have a discussion with Pat. This may aid in our interpretation steps, so that we are not double counting or needing to do any silly math.