lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

biostar103303 tool gives same value for "exon.count_prev_and_curr" and "exon.count_curr_and_next" #7

Closed vikasbansal closed 10 years ago

vikasbansal commented 10 years ago

Dear Pierre,

I have noticed that these 2 columns give same values with shift in one row. For e.g. "exon.count_prev_and_curr" has values- 0,12,3,4,3,11,22 and "exon.count_curr_and_next" has values- 12,3,4,3,11,22,0.

Best wishes, Vikas

lindenb commented 10 years ago

can you give me the id of the exon please.

vikasbansal commented 10 years ago

I think I gave you the bam file for myh7 gene. In output file it should be "NM_080728". I used grep "NM_080728" output.tsv.

P.S: These 2 columns have same column sum.

lindenb commented 10 years ago

I don't get it, can you show me the problem please.

exon.start  exon.end    exon.index5_3   exon.count_prev_and_next    exon.count_prev_and_curr    exon.count_curr_and_next    exon.count_curr_only    exon.count_others
55589525    55589647    1/41    0   0   994 553 17
55590214    55590348    2/41    0   994 17056   25526   5
55590462    55590557    3/41    2018    15037   139 129 6
55591525    55591800    4/41    2   2156    872 44690   0
55591908    55592033    5/41    721 153 6353    14070   0
55592139    55592342    6/41    260 6814    117 15962   0
55592595    55592903    7/41    258 119 229 83111   0
55593479    55593603    8/41    258 229 925 467 0
55593749    55593914    9/41    260 923 16625   9936    0
55594071    55594254    10/41   258 16627   2024    21436   0
55594716    55594912    11/41   258 2024    45  1653    0
55597420    55597538    12/41   259 44  15  322 0
55597704    55597830    13/41   259 15  1343    291 0
55598010    55598399    14/41   259 1343    67  7318    0
55599158    55599248    15/41   259 67  63  117 63
55601003    55601148    16/41   259 63  696 2111    0
55602079    55602255    17/41   279 676 125 524 65
55602380    55602622    18/41   279 125 51  7243    65
55603388    55603643    19/41   279 51  25  713 65
55603891    55604027    20/41   279 25  25  106 65
55604298    55604421    21/41   279 25  2032    5827    65
55604562    55604679    22/41   279 2032    92  1478    65
55605297    55605384    23/41   279 92  17  154 65
55605706    55605773    24/41   279 17  16  102 65
55606058    55606367    25/41   21  274 9   168 65
55607142    55607312    26/41   21  9   3191    10416   65
55607612    55607761    27/41   21  3191    3   10441   65
55607881    55607999    28/41   21  3   11  72  65
55608092    55608230    29/41   21  11  7   1412    65
55608815    55608918    30/41   22  6   45  507 77
55609142    55609240    31/41   22  45  67  603 71
55609579    55609642    32/41   0   89  35  708 0
55609753    55609845    33/41   0   35  46  313 0
55609929    55610037    34/41   0   46  154 623 0
55610553    55610580    35/41   0   8   448 392 146
55610675    55610831    36/41   0   594 92  5519    0
55611077    55611220    37/41   0   92  31  2618    0
55611510    55611718    38/41   0   31  43  293 0
55612141    55612196    39/41   0   43  51  535 0
55612425    55612462    40/41   0   48  19  524 3
55613348    55613386    41/41   0   22  0   137 0
vikasbansal commented 10 years ago

I ran it on whole BAM file. here is the e.g.

chrom exon.start exon.end exon.exon_id exon.index5_3 transcript_id gene_name gene_id exon.count_prev_and_next exon.count_prev_and_curr exon.count_curr_and_next exon.count_curr_only exon.count_others

chr10 3134304 3134909 . 1/13 NM_172546 . NM_172546 0 0 12 20 0 chr10 3185734 3185897 . 2/13 NM_172546 . NM_172546 0 12 3 33 0 chr10 3192056 3192258 . 3/13 NM_172546 . NM_172546 0 3 4 33 0 chr10 3193590 3193677 . 4/13 NM_172546 . NM_172546 0 4 13 8 0 chr10 3203539 3203580 . 5/13 NM_172546 . NM_172546 0 13 11 0 0 chr10 3206103 3206222 . 6/13 NM_172546 . NM_172546 0 11 22 50 0 chr10 3208127 3208186 . 7/13 NM_172546 . NM_172546 0 22 18 2 0 chr10 3211425 3211493 . 8/13 NM_172546 . NM_172546 0 18 42 14 0 chr10 3211873 3212019 . 9/13 NM_172546 . NM_172546 0 42 22 17 0 chr10 3217394 3217518 . 10/13 NM_172546 . NM_172546 0 22 45 24 0 chr10 3219698 3219906 . 11/13 NM_172546 . NM_172546 0 45 3 105 0 chr10 3220357 3220446 . 12/13 NM_172546 . NM_172546 0 3 15 22 0 chr10 3225955 3227479 . 13/13 NM_172546 . NM_172546 0 15 0 376 0 chr10 3308332 3309383 . 1/6 NM_001039652 . NM_001039652 0 0 0 2 0 chr10 3323356 3323399 . 1/8 NM_001033391 . NM_001033391 0 0 0 0 0 chr10 3332435 3332500 . 2/6 NM_001039652 . NM_001039652 0 0 0 0 0 chr10 3349965 3350011 . 2/8 NM_001033391 . NM_001033391 0 0 0 0 0 chr10 3366076 3366294 . 1/9 NM_001170800 . NM_001170800 0 0 0 0 0 chr10 3366076 3366294 . 1/7 NM_001170802 . NM_001170802 0 0 0 0 0 chr10 3366076 3366294 . 1/7 NM_001170801 . NM_001170801 0 0 0 0 0 chr10 3366837 3366925 . 3/6 NM_001039652 . NM_001039652 0 0 0 0 0 chr10 3366858 3367027 . 2/9 NM_001170800 . NM_001170800 0 0 0 0 0 chr10 3366858 3367027 . 2/7 NM_001170802 . NM_001170802 0 0 0 0 0 chr10 3366858 3367027 . 2/7 NM_001170801 . NM_001170801 0 0 0 0 0 chr10 3390409 3390482 . 3/8 NM_001033391 . NM_001033391 0 0 0 0 0 chr10 3390409 3390482 . 3/9 NM_001170800 . NM_001170800 0 0 0 0 0 chr10 3390409 3390482 . 3/7 NM_001170802 . NM_001170802 0 0 0 0 0 chr10 3390409 3390482 . 3/7 NM_001170801 . NM_001170801 0 0 0 0 0 chr10 3391800 3391871 . 4/8 NM_001033391 . NM_001033391 0 0 0 0 0 chr10 3391800 3391871 . 4/9 NM_001170800 . NM_001170800 0 0 0 0 0 chr10 3391800 3391871 . 4/7 NM_001170802 . NM_001170802 0 0 0 0 0 chr10 3391800 3391871 . 4/7 NM_001170801 . NM_001170801 0 0 0 0 0 chr10 3409382 3409440 . 5/8 NM_001033391 . NM_001033391 0 0 2 0 0 chr10 3409382 3409440 . 5/9 NM_001170800 . NM_001170800 0 0 2 0 0 chr10 3409382 3409440 . 5/7 NM_001170802 . NM_001170802 0 0 2 0 0 chr10 3409382 3409440 . 5/7 NM_001170801 . NM_001170801 0 0 2 0 0 chr10 3411266 3411354 . 6/8 NM_001033391 . NM_001033391 0 2 0 0 0 chr10 3411266 3411354 . 6/9 NM_001170800 . NM_001170800 0 2 0 0 0 chr10 3411266 3411354 . 6/7 NM_001170802 . NM_001170802 0 2 0 0 0 chr10 3411266 3411366 . 6/7 NM_001170801 . NM_001170801 0 2 0 0 0

Also the example you showed-

55594071 55594254 10/41 258 16627 2024 21436 0 55594716 55594912 11/41 258 2024 45 1653 0 55597420 55597538 12/41 259 44 15 322 0 55597704 55597830 13/41 259 15 1343 291 0 55598010 55598399 14/41 259 1343 67 7318 0 55599158 55599248 15/41 259 67 63 117 63 55601003 55601148 16/41 259 63 696 2111 0 55602079 55602255 17/41 279 676 125 524 65 55602380 55602622 18/41 279 125 51 7243 65 55603388 55603643 19/41 279 51 25 713 65 55603891 55604027 20/41 279 25 25 106 65 55604298 55604421 21/41 279 25 2032 5827 65 55604562 55604679 22/41 279 2032 92 1478 65 55605297 55605384 23/41 279 92 17 154 65 55605706 55605773 24/41 279 17 16 102 65

If you will notice for e.g. 1343,67,63

****EDIT****** It seems like "exon.count_prev_and_curr" has the same value as "exon.count_curr_and_next" in previous row.

lindenb commented 10 years ago

Isn't it normal that the exon will have ~same value of the previous line for the exon+1 ?

vikasbansal commented 10 years ago

Oh yes. You are right. But then question would be, why they are different in some cases?

_EDIT_ Do you count junction read if its part is completely with in annotated exon?

lindenb commented 10 years ago

no time to check this now. I suggest you to convert a given region to bed using bamToBed -bed12 and open a custom track in the ucsc to view the reads.

vikasbansal commented 10 years ago

Ok. Thanks a lot. Just one question- for "exon.count_prev_and_next" (supports skipping of exon), do you consider only immediate previous (upstream) and next (downstream) exon?

lindenb commented 10 years ago

all downstream and upstream exon are considered: see https://github.com/lindenb/jvarkit/blob/6294613b8eff3419427ddff13a37a84fcffaba21/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar103303.java#L349 and https://github.com/lindenb/jvarkit/blob/6294613b8eff3419427ddff13a37a84fcffaba21/src/main/java/com/github/lindenb/jvarkit/tools/biostar/Biostar103303.java#L356