joshuagryphon / plastid

Position-wise analysis of sequencing and genomics data
https://plastid.readthedocs.io
Other
36 stars 16 forks source link

Generated .bed file has "0" score in the 3rd line - When generating .bedgraph out of this .bed, no coverage of Ribo-seq data. #33

Closed AlexFryd closed 4 years ago

AlexFryd commented 5 years ago

Dear Plastid Team,

I have been using plastid for about a week now. Everything runs fine, no errors. Metagene profile looks representative of regular riboseq data.

However, the .bed file that is returned (after the first 31 lines) when running the metagene generate function looks like this:

6 41079039 41079164 ENSG00000001167 0 + 41079089 41079090 0,0,0 1 125, 0, 6 46130172 46139783 ENSG00000001561 0 + 46139583 46139584 0,0,0 2 17,233, 0,9378, 4 11399805 11400055 ENSG00000002587 0 - 11400004 11400005 0,0,0 1 250, 0, 17 28355837 28357463 ENSG00000004142 0 - 28357447 28357448 0,0,0 2 39,176, 0,1450, 12 21491532 21499615 ENSG00000004700 0 - 21499569 21499570 0,0,0 2 184,61, 0,8022, 19 35755892 35757009 ENSG00000004776 0 - 35757007 35757008 0,0,0 2 2,199, 0,918, X 11112018 11114934 ENSG00000004961 0 + 11112060 11112061 0,0,0 2 142,100, 0,2816,

4th line, on what I have checked on .BED file format, would result on a "score" which on this case I would assume would be the ribosome-protected-fragment counts on the selected windows. Is that right?

I then make a .bedgraph file using this command: $ cut -f1-3,5 my.bed > my.bedgraph . It looks like this (again, after the first 31 lines):

6 41079039 41079164 0 6 46130172 46139783 0 4 11399805 11400055 0 17 28355837 28357463 0 12 21491532 21499615 0 19 35755892 35757009 0 X 11112018 11114934 0

I would assume that the 3rd line should have some numbers so as to allow visualization into genome browsers or other tools.

Is there something I am doing wrong?

Thank you in advance, Alexandros

joshuagryphon commented 4 years ago

Hi @AlexFryd ,

I'm sorry it has taken so long for me to get back to you- my current job has become much busier than it used to be, and I haven't been able to do much work here.

To answer your question, the score in the BED files output by metagene generate are all set to zero, because they're made in the generate step, which is a function of a genome annotation only, and doesn't use any alignment data. The idea is that computing the windows in the generate step is often expensive, and only needs to be done once, as the windows can be re-used with various alignment files (e.g. to compare metagene profiles across conditions).

The scores you want are emitted by metagene count (which does use info from alignments) specifically in the file foo_metagene_profile.txt (where foo is whatever output name you supplied during the run). Hopefully you were already able to find this for your research.

If it would be useful to set the scores in the window file from the generate step to .something other than zero, I welcome your suggestions- often the score in a BED is used to capture something about the quality of the annotations.

In case helpful, details may be found in the metagene documentation, here:

https://plastid.readthedocs.io/en/latest/generated/plastid.bin.metagene.html#module-plastid.bin.metagene

Cheers, Josh

joshuagryphon commented 4 years ago

Hi @AlexFryd ,

Closing this issue. If you have further questions feel free to re-open it.

Chers, Josh