jlab-code / MethylStar

A fast and robust pre-processing pipeline for bulk or single-cell whole-genome bisulfite sequencing (WGBS) data.
GNU General Public License v3.0
30 stars 6 forks source link

How can I make TEs.Rdata in mouse? #13

Closed kimchuna closed 3 years ago

kimchuna commented 3 years ago

Thanks for developing good pipeline. I run MerhylStar in mouse WGBS data. But I got an following error in methimpute step.

_ERROR : undefined columns selected [1] "Running...../cx-reports/301-100_2.CX_report.txt" Reading file ../cx-reports/301-100_2.CX_report.txt ..../src/bash/methimpute.sh: line 9: 11405 Killed Rscript ./src/bash/methimpute.R $result_pipeline $genome_ref $genome_name $tmp_rdata $intermediate $fit_output $enrichment_plot $full_report $context_report $intermediatemode --no-save --no-restore --verbose sort: cannot read: /results/methimpute-out/file-processed.lst: No such file or directory

I'm expecting this problem might be due to the TEs.Rdata file format. I changed Arabidopsis thaliana genes.RData to mice genes.RData as you suggested in manual. But I dind't find the code about TEs.RData. So, I made TEs.RData using GRCm38_Ensembl_rmsk_TE.gtf (http://hammelllab.labsites.cshl.edu/software/#TEtranscripts) like this.

library(rtracklayer)
file2 <- "GRCm38_Ensembl_rmsk_TE.gtf"
mygtf <- import(file2)
names(mygtf) <- elementMetadata(mygtf)$family_id
save(mygtf, file="TEs.RData")

Could you let me know how to make TEs.RData?

shahryary commented 3 years ago

@kimchuna Thank you for using our pipeline.

The code you are using to make a "TEs.RData" is fine, but the row names are not unique. It would help if you used the "gene_id" column instead of "family_id" to name the rows. Please see the attached image that shows how the Arabidopsis "TEs.Rdata" looks like:

Screenshot 2021-01-07 at 11 19 05

Please let me know if you have any questions.