rosedu1 / deconvSeq

deconvolution for RNAseq and bisulfite sequencing
10 stars 5 forks source link

bismark files #5

Open lc716 opened 4 years ago

lc716 commented 4 years ago

Hi,

Thanks for the great package. Unfortunately i cant create a methmat with any of the bismark derived files. Could you please specift with which of the methylation extracted files from bismark is your package compatible with? I have tried both with the cov and the cytosine report but the matrix remains blank.

Thanks, Leo

lc716 commented 4 years ago

In your description you write that : " Input file from Bismark coverage for CG only has columns "chrBase","chr","base","strand","coverage","freqC","freqT"

However Bismark coverage files have as headings the following:

Could you please clarify if there is way to convert Bismark files in order to make them compatible with Deconvseq? Thanks again, Leo
rosedu1 commented 4 years ago

Leo, Can you give me the headings for your files? Rose

lc716 commented 4 years ago

Hi Rose,

thanks for your prompt response. The script works well with the vignette files but it doesnt seem to be able to read any of the files i get from the methylation extraction step of bismark. I tried three different files derived from bismark or methylkit.

The standard coverage file .cov looks like that: chr17 64761 64761 100 1 0 chr17 64821 64821 100 1 0 chr17 64949 64949 100 1 0 chr17 92370 92370 0 0 1 chr17 92372 92372 0 0 1 chr17 92382 92382 0 0 1 chr17 92415 92415 0 0 1 chr17 92419 92419 0 0 1 chr17 92569 92569 100 1 0 chr17 131021 131021 100 4 0 chr17 131249 131249 87.5 7 1

The cytosine report (coverage2cytosine function) file looks like that: chr2 288383 + 2 25 CG CGG chr2 288384 - 1 14 CG CGG chr2 288386 + 1 26 CG CGA chr2 288387 - 0 16 CG CGC chr2 288390 + 0 27 CG CGC chr2 288391 - 0 16 CG CGC chr2 288393 + 1 26 CG CGA chr2 288394 - 0 16 CG CGG

and the tabix files from methylkit where it reads the bismark files and then creates a similar matrix to deconvseq: chr1 10497 10497 34 33 1 chr1 10525 10525 34 34 0 chr1 10542 10542 34 33 1 chr1 10563 10563 34 31 3 chr1 10571 10571 34 31 3 chr1 10577 10577 34 26 8 chr1 10579 10579 34 24 10 chr1 136876 136876 18 18 0 chr1 136895 136895 18 14 4 chr1 136911 136911 18 16 2

Full details on the bismark type files are here:

https://rawgit.com/FelixKrueger/Bismark/master/Docs/Bismark_User_Guide.html

Thanks again, Leo

rosedu1 commented 4 years ago

Leo, Did you use the function getmethmat to get the methylation matrix? For Bismark coverage filetypes, you should use filtype="bismark".

methmat = getmethmat(filnames=c(file1,file2), sample.id=c("sample1","sample2"), filtype="bismark")

I will also update the package so that it is usable with either the coverage or cytosine report filetypes in the next version.

Rose

lc716 commented 4 years ago

Dear Rose,

Thanks for getting back to me. Yes i use the filtype="bismark" option but it still doesn't seem to be able to read the files. I have also raised this with Felix Krueger who wrote bismark to check if there is a type of file with these particular headings and he confirmed that this is not the case so i suspect that i will either have to find a quick way to modify my files or wait for the next version.

Thanks again for all your help and look forward to the updated version!

Best wishes, Leo

rosedu1 commented 4 years ago

Leo, I have updated the package. You can use either "bismarkCoverage" or "bismarkCytosineReport" for file type. Let me know if you still have problems. Rose

lc716 commented 4 years ago

Thanks so much Rose! I''ll try it out and let you know! Cheers, Leo

lc716 commented 4 years ago

Thanks again for your help Rose, and one last question to clarify:

If using data from human whole blood and wanting to calculate the proportion of the same types of blood cells, can I use the methylation matrix for the individual cell types from the vignette to calculate b0 or do I need to download the full data set from the source and create a new one?

Thanks, Leo

rosedu1 commented 4 years ago

The matrix in the vignette is derived from the full matrix using the databases we mentioned in our paper. You could either use it or create a new one.