Open weix-cshl opened 3 years ago
Calculate the repeat coverage and recorded it here
Here is the coverage for the 18 NAM lines already done in the 1st step
| NAM line | coverage |
|----------|:---------|
|zea_maysb73ab10 | 84.80%|
|zea_maysb97 | 84.60% |
|zea_mayscml103 | 84.36% |
|zea_mayscml228 | 83.70% |
|zea_mayscml277 | 84.26% |
|zea_mayscml322 | 84.75% |
|zea_mayscml52 | 83.91%|
|zea_mayscml69 | 84.50%|
|zea_mayshp301 | 84.25%|
|zea_maysil14h |84.28%|
|zea_maysm37w| 84.61%|
|zea_maysmo18w |84.56%|
|zea_maysnc358 |84.63%|
|zea_maysoh43 |84.39%|
|zea_maysoh7b |84.10%|
|zea_maysp39 |84.22%|
|zea_maystx303 |84.42%|
|zea_maystzi8 |84.23%|
Run EG pipeline DNAFeatures_conf to repeatMask NAM genomes and load the repeat features to their core databases.
In 2019, We ran DNAFeatures pipeline to load repeat features to core dbs for 10 NAM lines. At the time we used Wessler-Bennetzen library for customized repeats. (
/mnt/grid/ware/hpc/home/data/weix/repeatMask_pipelines/Libraries/Maize/wessler-bennetzen-2015/TE_12-Feb-2015_15-35.fa
) The databases were backed up atbrie:/scratch/weix/NAMcoreDBs/2019RM/
Now we have a filtered and better repeat library MTEC (
/mnt/grid/ware/hpc/home/data/weix/repeatMask_pipelines/Libraries/Maize/shujun-filtered/maizeTE10102014.RMname.nogene
) generated by Shujun, which gave more consistent coverage across NAM lines.What I did was two steps.
Run dna_feature pipelines over the 18 unrepeatMasked NAM lines, the pipeline perform the following analyses
Redo the customized repeatMasking for the other 9 NAM lines that was repeatMasked in 2019
To aviod repeating the same analysis, I created a new config file
Sharon/modules/Bio/EnsEMBL/EGPipeline/PipeConfig/DNAFeaturesCustom_conf.pm
that only performs the customized repeat library analysis, we can reuse the old result for the other 3 analyses. We only replace the customzied repeatMasker.init_pipeline.pl Bio::EnsEMBL::EGPipeline::PipeConfig::DNAFeaturesCustom_conf --host bhsqldw1 --port 3306 --user plensembl --pass AudreyII -registry /grid/ware/data/data/weix/data/NAM/registry/$species.reg -pipeline_dir $PIPELINE_DIR -species zea_mays -repeatmasker_library all=/mnt/grid/ware/hpc/home/data/weix/repeatMask_pipelines/Libraries/Maize/shujun-filtered/maizeTE10102014.RMname.nogene -always_use_repbase 0 -no_dust 1 -no_trf 1 -hive_force_init 1