stjude / ROSE

ROSE: RANK ORDERING OF SUPER-ENHANCERS
Other
41 stars 12 forks source link

Can I use this tool for plants? #31

Open hanshanmengqi opened 11 months ago

hanshanmengqi commented 11 months ago

Hi,

Thank you for your interesting tool.

I work on plants, so can I use it for plants?

Best, Han

Ziwei-Liu commented 7 months ago

Yes.

Here is my command lines:

for i in *.chr.bam; \
do \
i=${i%.chr.bam*}; \
nohup ROSE_main.py --custom *_refseq.ucsc -i ${i}.cut.bed -r ${i}.chr.bam -o ./${i}/ 2>${i}.log & \
done

Here we only have to insure that:

  1. The chromosome names in your bam file and peak file are started with 'chr'. If you have contigs, them name them 'chrC1','chrC2', or 'chrContig1','chrContig2', or anything begin with chr as you like.
  2. Peak files can directly use the narrowPeak files produced by macs2, while remember changing its suffix to .bed.
  3. Custom your own genome annotation file the same format as UCSC table track format. If you are using a gff3 file as your annotation file, then use a software called gff3ToGenePred to transform it. But Remember to add an index column and a header row to your transformed annotation file, for a normal transformation did not make the format completely the same as the examples provided in the program's annotation folder. A typical gff3 file:
    ##gff-version 3
    chrC1   EVM     gene    118111  135837  .       +       .       ID=Contig1G000001;
    chrC1   EVM     mRNA    118111  135837  .       +       .       ID=Contig1G000001.mRNA1;Parent=Contig1G000001
    chrC1   EVM     exon    118111  118122  .       +       .       ID=Contig1G000001.exon1;Parent=Contig1G000001.mRNA1
    chrC1   EVM     CDS     118111  118122  .       +       0       ID=Contig1G000001.cds1;Parent=Contig1G000001.mRNA1
    chrC1   EVM     exon    120459  120548  .       +       .       ID=Contig1G000001.exon2;Parent=Contig1G000001.mRNA1
    chrC1   EVM     CDS     120459  120548  .       +       0       ID=Contig1G000001.cds2;Parent=Contig1G000001.mRNA1

    A transformed genepred file using gff3ToGenePred:

    Contig1G000005.mRNA1    chrC1   +       185081  186707  185081  186707  2       185081,186044,  185093,186707,  0       Contig1G000005  cmpl    cmpl    0,0,
    Contig1G000004.mRNA1    chrC1   +       153060  171316  153060  171316  14      153060,153415,153606,153849,155852,156537,156725,160188,161266,161580,164263,166108,166471,171012,      153075,153490,153816,153899,155865,156612,157021,160276,161473,161622,164343,166186,167116,171316,   0       Contig1G000004  cmpl    cmpl    0,0,0,0,1,0,0,1,0,0,0,1,1,1,
    Contig1G000003.mRNA1    chrC1   +       148519  149466  148519  149466  4       148519,148766,149001,149196,    148607,148885,149076,149466,    0       Contig1G000003  cmpl    cmpl    0,2,0,0,
    Contig1G000002.mRNA1    chrC1   +       136234  137231  136234  137231  3       136234,136564,136919,   136246,136639,137231,   0       Contig1G000002  cmpl    cmpl    0,0,0,
    Contig1G000001.mRNA1    chrC1   +       118110  135837  118110  135837  7       118110,120458,121255,128550,128987,129666,135809,       118122,120548,121489,128646,129036,129703,135837,       0       Contig1G000001  cmpl    cmpl    0,0,0,0,0,2,1,

    An example from the repositories' anntotation folder:

    #bin    name    chrom   strand  txStart txEnd   cdsStart    cdsEnd  exonCount   exonStarts  exonEnds    score   name2   cdsStartStat    cdsEndStat  exonFrames
    0   NR_075077   chr1    -   67092175    67134971    67134971    67134971    10  67092175,67096251,67103237,67111576,67113613,67115351,67125751,67127165,67131141,67134929,  67093604,67096321,67103382,67111644,67113756,67115464,67125909,67127257,67131227,67134971,  0   C1orf141    unk unk -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
    0   NM_001276352    chr1    -   67092175    67134971    67093579    67127240    9   67092175,67096251,67103237,67111576,67115351,67125751,67127165,67131141,67134929,   67093604,67096321,67103382,67111644,67115464,67125909,67127257,67131227,67134971,   0   C1orf141    cmpl    cmpl    2,1,0,1,2,0,0,-1,-1,
    0   NM_001276351    chr1    -   67092175    67134971    67093004    67127240    8   67092175,67095234,67096251,67115351,67125751,67127165,67131141,67134929,    67093604,67095421,67096321,67115464,67125909,67127257,67131227,67134971,    0   C1orf141    cmpl    cmpl    0,2,1,2,0,0,-1,-1,
    0   NM_000299   chr1    +   201283451   201332993   201283702   201328836   15  201283451,201293941,201313165,201316552,201317571,201318617,201319815,201320266,201321977,201323012,201324427,201324940,201325753,201328761,201330073,  201283904,201294045,201313560,201316697,201317779,201318795,201319878,201320381,201322133,201323189,201324581,201325127,201325838,201328868,201332993,  0   PKP1    cmpl    cmpl    0,1,0,2,0,1,2,2,0,0,0,1,2,0,-1,

    So remember adding the bin column and the header line manually A customed annotation file provided to ROSE finally:

    #bin    name    chrom   strand  txStart txEnd   cdsStart        cdsEnd  exonCount       exonStarts      exonEnds        score   name2   cdsStartStat    cdsEndStat      exonFrames
    0       Contig1G000010.mRNA1    chrC1   +       249212  251406  249212  251406  4       249212,249358,249619,251001,    249227,249503,249801,251406,    0       Contig1G000010  cmpl    cmpl    0,0,2,0,
    0       Contig1G000009.mRNA1    chrC1   +       222452  247955  222452  247955  19      222452,225441,227650,227864,228265,235538,235788,236674,236869,239211,239470,241009,242202,242448,244280,244530,245983,246205,247562,   222461,225576,227722,228133,228343,235755,235841,236764,236951,239344,239728,241391,242308,242582,244497,244583,246078,246315,247955,        0       Contig1G000009  cmpl    cmpl    0,0,0,0,1,1,0,1,1,0,2,2,1,0,1,0,1,2,0,
    0       Contig1G000008.mRNA1    chrC1   +       220918  222208  220918  222208  4       220918,221411,221849,222117,    220975,221483,222079,222208,    0       Contig1G000008  cmpl    cmpl    0,0,0,1,
    0       Contig1G000007.mRNA1    chrC1   +       207537  210311  207537  210311  5       207537,208815,209732,209965,210233,     207558,208941,209804,210193,210311,     0       Contig1G000007  cmpl    cmpl    0,0,0,0,0,
    0       Contig1G000006.mRNA1    chrC1   +       198072  199140  198072  199140  4       198072,198399,198570,199094,    198084,198471,198896,199140,    0       Contig1G000006  cmpl    cmpl    0,0,0,1,
    0       Contig1G000005.mRNA1    chrC1   +       185081  186707  185081  186707  2       185081,186044,  185093,186707,  0       Contig1G000005  cmpl    cmpl    0,0,
    0       Contig1G000004.mRNA1    chrC1   +       153060  171316  153060  171316  14      153060,153415,153606,153849,155852,156537,156725,160188,161266,161580,164263,166108,166471,171012,      153075,153490,153816,153899,155865,156612,157021,160276,161473,161622,164343,166186,167116,171316,   0       Contig1G000004  cmpl    cmpl    0,0,0,0,1,0,0,1,0,0,0,1,1,1,
    0       Contig1G000003.mRNA1    chrC1   +       148519  149466  148519  149466  4       148519,148766,149001,149196,    148607,148885,149076,149466,    0       Contig1G000003  cmpl    cmpl    0,2,0,0,
    0       Contig1G000002.mRNA1    chrC1   +       136234  137231  136234  137231  3       136234,136564,136919,   136246,136639,137231,   0       Contig1G000002  cmpl    cmpl    0,0,0,
    0       Contig1G000001.mRNA1    chrC1   +       118110  135837  118110  135837  7       118110,120458,121255,128550,128987,129666,135809,       118122,120548,121489,128646,129036,129703,135837,       0       Contig1G000001  cmpl    cmpl0,0,0,0,0,2,1,
  4. The annotation file used must be named as *_refseq.ucsc, remember renaming your customed annotation file after changed its format into what you need.

This tool is easy to use, and powerful, I like it too.

hanshanmengqi commented 2 months ago

Dear Liu,

Thank you so much for your detailed reply.

However, I encountered an error with gff3ToGenePred. The error message is: 'CDS feature must have phase.' My commang is: gff3ToGenePred input.gff3 output.Gp Do you have any suggestions on how to resolve this?

Best, Han

Ziwei-Liu commented 2 months ago

Dear Liu,

Thank you so much for your detailed reply.

However, I encountered an error with gff3ToGenePred. The error message is: 'CDS feature must have phase.' My commang is: gff3ToGenePred input.gff3 output.Gp Do you have any suggestions on how to resolve this?

Best, Han

Maybe you should check whether your gff file has correctly annotated phase of your cds. To do so, check the 8th column of lines that are marked as CDS in 3rd column in your gff file and make sure it appears to be one of the three numbers of 0, 1, or 2 but not any other symbols like '.'. For detailed information and examples about gff format and what does 'phase' mean for CDS, please check https://www.ncbi.nlm.nih.gov/datasets/docs/v1/reference-docs/file-formats/about-ncbi-gff3/