sr320 / paper-pano-go

Draft manuscript describing Panopea gonad transcriptome
2 stars 7 forks source link

rpkm values: first attempt #26

Closed JaviRobles closed 7 years ago

JaviRobles commented 8 years ago

Hi! today I tried to calculate the rpkm values with the database that @sr320 provide to us, I did one procedure to calculate it but I dont know if I did it right; so I create a MD file to explain to you how did I get to those values.... Please, if I am too lost I would be very happy if you can turn me to the right direction.

Here the files and the excel table:

https://github.com/JaviRobles/xpreszion/blob/master/rpkm/RPKM.md

Database Excel: https://github.com/JaviRobles/xpreszion/blob/master/Male-v-Female-ExpressionJR.xlsx

sr320 commented 8 years ago

This would seem appropriate to me....

mdelrio1 commented 8 years ago

@sr320 @JaviRobles I was checking the data file you have found several (~17266) reads with either no experiment value, or no male or no female values. I think we should remove them from the calculations (with an "if" statement) in order to reduce the number of points in the graph. Furthermore, I found several data with strange values in the original file data, for instance comp122474_c0seq1,6407,6407,âˆ,0,0,6407,6407 comp100083_c0_seq1,1,-1,#NAME?,1,1,0,0

what does the "âˆ_" and "#NAME?" mean? (when imported into excel the values change to

NAME? with a total count of 21137

‰ö_ with a total count of 13984 )

so far, because of the way the calculations are done, it seems to me that these data are changing the values for the rpm and thus the rpkm. Should we reduce the amount of data in this case?

Should we also calculate the TPM (transcripts per million, Wagner et al. 2012)?

(http://link.springer.com/article/10.1007%2Fs12064-012-0162-3)

sr320 commented 8 years ago

Those symbol are in fold difference columns and with no totals reads there is no number ∠is when female only and #NAME? is male only (zero female reads).

Personally I would not get to hung up on rpkm etc as we have no reps but just make broad classifcations

for example:


Male versus Female Expression

Male "only" Contigs where Female Unique reads = 0 Male unique reads > 200 148 Contigs

comp140210_c0_seq1
comp142770_c0_seq1
comp49714_c0_seq1
comp138780_c0_seq1
comp141568_c0_seq1
comp144633_c0_seq1
comp49453_c0_seq1
comp126598_c0_seq1
comp133894_c0_seq1
comp137528_c0_seq1
comp89797_c0_seq1
comp134377_c0_seq1
comp119642_c0_seq1
comp127773_c0_seq1
comp138496_c0_seq1
comp127835_c0_seq1
comp143612_c0_seq1
comp134152_c0_seq1
comp129252_c0_seq1
comp143000_c0_seq1
comp136544_c1_seq1
comp127948_c0_seq1
comp137363_c0_seq1
comp138623_c1_seq3
comp131683_c0_seq1
comp131710_c0_seq1
comp125269_c0_seq1
comp123205_c0_seq1
comp142331_c0_seq14
comp115970_c0_seq1
comp130931_c0_seq1
comp88358_c0_seq1
comp141641_c0_seq2
comp139826_c1_seq1
comp93867_c0_seq1
comp133628_c2_seq2
comp120259_c0_seq1
comp137935_c1_seq1
comp130600_c0_seq1
comp144434_c0_seq1
comp130239_c2_seq1
comp129942_c0_seq2
comp138915_c0_seq4
comp126398_c0_seq1
comp141856_c3_seq1
comp39309_c0_seq1
comp141202_c1_seq1
comp140223_c0_seq1
comp138913_c4_seq3
comp141802_c3_seq2
comp137300_c0_seq1
comp133576_c0_seq3
comp139440_c1_seq2
comp125323_c0_seq1
comp132965_c0_seq1
comp143899_c0_seq17
comp139882_c0_seq2
comp116840_c0_seq1
comp92975_c0_seq1
comp139252_c0_seq1
comp142162_c0_seq2
comp125636_c0_seq1
comp109283_c0_seq1
comp144391_c0_seq1
comp140151_c0_seq4
comp138711_c1_seq1
comp131714_c0_seq1
comp144391_c0_seq2
comp144499_c0_seq1
comp120846_c0_seq1
comp132952_c0_seq1
comp128121_c0_seq1
comp94838_c0_seq2
comp140806_c0_seq3
comp137854_c0_seq2
comp139828_c0_seq1
comp142331_c0_seq1
comp144570_c0_seq7
comp140059_c0_seq2
comp120381_c0_seq1
comp137567_c0_seq4
comp138900_c0_seq1
comp139240_c0_seq1
comp140110_c0_seq1
comp138757_c1_seq1
comp144645_c0_seq1
comp135845_c0_seq6
comp115368_c0_seq2
comp143240_c0_seq2
comp55182_c0_seq1
comp143578_c0_seq3
comp139526_c0_seq14
comp133628_c0_seq1
comp140112_c5_seq1
comp137031_c0_seq1
comp130843_c0_seq1
comp144472_c1_seq3
comp143319_c0_seq1
comp138824_c0_seq1
comp141051_c0_seq1
comp140198_c0_seq1
comp139316_c1_seq2
comp110984_c0_seq1
comp143796_c0_seq2
comp141751_c0_seq1
comp125846_c0_seq2
comp141051_c2_seq1
comp142752_c1_seq1
comp120530_c0_seq1
comp131268_c0_seq1
comp143001_c0_seq1
comp139826_c7_seq9
comp134699_c1_seq1
comp130239_c2_seq3
comp136803_c0_seq1
comp135820_c0_seq2
comp135586_c1_seq2
comp144174_c1_seq6
comp134384_c0_seq1
comp143944_c0_seq3
comp141679_c1_seq1
comp135418_c0_seq2
comp121778_c0_seq1
comp140059_c0_seq1
comp49083_c0_seq3
comp133725_c0_seq3
comp134076_c0_seq3
comp143939_c0_seq3
comp137944_c0_seq1
comp137388_c0_seq2
comp106595_c0_seq1
comp127545_c1_seq1
comp137905_c0_seq6
comp141394_c0_seq1
comp141856_c2_seq1
comp137905_c0_seq4
comp51572_c0_seq1
comp141137_c0_seq1
comp138410_c2_seq3
comp135384_c0_seq1
comp143958_c0_seq1
comp134469_c0_seq1
comp142862_c1_seq1
comp138587_c0_seq2
comp143094_c0_seq2
comp143578_c0_seq1
comp127698_c2_seq2
comp137567_c0_seq1

Female "only" Contigs where Male Unique reads = 0 Female unique reads > 200 198 Contigs

comp122474_c0_seq1
comp134910_c1_seq2
comp139102_c0_seq1
comp144634_c0_seq1
comp136323_c0_seq1
comp137592_c0_seq4
comp133647_c0_seq2
comp137746_c0_seq1
comp143800_c0_seq11
comp123451_c1_seq1
comp144244_c0_seq2
comp139370_c0_seq1
comp143800_c0_seq3
comp124582_c0_seq2
comp114551_c0_seq1
comp127704_c0_seq1
comp127291_c0_seq1
comp132897_c0_seq1
comp122946_c0_seq1
comp134759_c0_seq2
comp142585_c0_seq1
comp144170_c0_seq1
comp127694_c0_seq1
comp140811_c0_seq2
comp92370_c0_seq1
comp131236_c0_seq1
comp141850_c0_seq1
comp130998_c0_seq2
comp133188_c0_seq1
comp121827_c0_seq1
comp134973_c0_seq2
comp139258_c0_seq4
comp133662_c0_seq2
comp139557_c0_seq1
comp138352_c0_seq1
comp144290_c0_seq3
comp144081_c0_seq1
comp143727_c0_seq2
comp126642_c0_seq1
comp134860_c0_seq1
comp139866_c1_seq1
comp130732_c0_seq2
comp128471_c1_seq1
comp143305_c1_seq2
comp138268_c1_seq2
comp134759_c0_seq1
comp136800_c0_seq1
comp143593_c0_seq1
comp116346_c0_seq2
comp130998_c0_seq1
comp142896_c0_seq5
comp132331_c0_seq1
comp140012_c2_seq2
comp123739_c0_seq1
comp119550_c0_seq1
comp112612_c0_seq1
comp117751_c0_seq1
comp143172_c1_seq1
comp137098_c1_seq1
comp142829_c0_seq1
comp138855_c0_seq1
comp142821_c0_seq3
comp139176_c1_seq1
comp111098_c0_seq1
comp137006_c1_seq2
comp134542_c0_seq2
comp139129_c0_seq2
comp133995_c0_seq2
comp123319_c0_seq2
comp137663_c0_seq1
comp139964_c0_seq4
comp131552_c1_seq1
comp135608_c0_seq11
comp133647_c0_seq1
comp54365_c0_seq1
comp127117_c0_seq1
comp142821_c0_seq1
comp120346_c0_seq2
comp140211_c0_seq5
comp138901_c0_seq1
comp140211_c0_seq3
comp143394_c0_seq3
comp136894_c0_seq1
comp127117_c1_seq1
comp136020_c0_seq1
comp143332_c0_seq6
comp141065_c1_seq1
comp131925_c1_seq1
comp135883_c0_seq1
comp117443_c0_seq1
comp137144_c3_seq2
comp141065_c1_seq7
comp140793_c0_seq3
comp112034_c0_seq1
comp131426_c0_seq1
comp142230_c1_seq2
comp125521_c0_seq1
comp140144_c0_seq2
comp126227_c0_seq1
comp134973_c0_seq1
comp130680_c1_seq1
comp139108_c0_seq4
comp88489_c0_seq1
comp134161_c0_seq1
comp142580_c0_seq4
comp139138_c0_seq1
comp143380_c0_seq2
comp139964_c0_seq6
comp125520_c0_seq3
comp110900_c0_seq2
comp136075_c2_seq1
comp125036_c0_seq1
comp129936_c0_seq1
comp142735_c0_seq2
comp140012_c1_seq4
comp143275_c0_seq1
comp139150_c0_seq1
comp126495_c0_seq1
comp127672_c0_seq1
comp139905_c0_seq2
comp129315_c0_seq1
comp135411_c0_seq4
comp138627_c0_seq4
comp138744_c0_seq2
comp142808_c0_seq1
comp136927_c4_seq1
comp139501_c0_seq3
comp126364_c0_seq1
comp144653_c0_seq1
comp142821_c0_seq2
comp133738_c0_seq3
comp133327_c0_seq2
comp131848_c0_seq1
comp138043_c2_seq2
comp136569_c1_seq6
comp116046_c0_seq1
comp135624_c0_seq4
comp139357_c3_seq7
comp142091_c2_seq1
comp135695_c0_seq4
comp125595_c0_seq1
comp141893_c0_seq5
comp143994_c0_seq1
comp134213_c0_seq1
comp126513_c0_seq1
comp108107_c0_seq1
comp141158_c0_seq4
comp131396_c0_seq1
comp140606_c1_seq6
comp135791_c4_seq1
comp136022_c0_seq1
comp144690_c0_seq1
comp131591_c0_seq1
comp124173_c1_seq1
comp139805_c1_seq1
comp128524_c0_seq2
comp94934_c0_seq1
comp108829_c0_seq2
comp136984_c0_seq3
comp134223_c0_seq2
comp135695_c0_seq1
comp144568_c1_seq1
comp141212_c0_seq4
comp130509_c1_seq1
comp133110_c0_seq1
comp92890_c0_seq1
comp143863_c0_seq3
comp127292_c0_seq1
comp133262_c1_seq1
comp142716_c0_seq9
comp139673_c3_seq3
comp126633_c0_seq1
comp139964_c0_seq2
comp142733_c2_seq1
comp127158_c0_seq1
comp131900_c0_seq2
comp138691_c0_seq3
comp122581_c0_seq1
comp131532_c2_seq2
comp135695_c0_seq3
comp131552_c0_seq2
comp136321_c1_seq2
comp140069_c2_seq2
comp120358_c0_seq1
comp143194_c1_seq1
comp134545_c0_seq3
comp137184_c0_seq4
comp139969_c1_seq1
comp124907_c0_seq1
comp122204_c0_seq1
comp139366_c0_seq15
comp135608_c0_seq13
comp127455_c2_seq1
comp134454_c0_seq1
comp144321_c0_seq12
comp135368_c0_seq2
comp136531_c0_seq1
comp125975_c0_seq1

and just annotate them, they are already annotated so it is just identifying them in a table.

mdelrio1 commented 8 years ago

@sr320 @JaviRobles @lacroix54 @lafarga13 Steven I ran the script in the notebook paper-pano-go/jupyter-nbs/07-Gene-expressionGeo-Final.ipynb in order to compare the results that you gave us here, and they are the same.

Also, I was thinking on obtaining the Male over Female (higher expression in males than in females) and the Female over Male genes (Females>Males), and in the notebook paper-pano-go/jupyter-nbs/07-Gene-expressionGeo-Final.ipynb to do this I used

a) 'Male-Unique'/'Female-Unique' expression >100 found 150 contigs

b) 'Male-Unique'/'Female-Unique' expression <0.01 found 477 contigs

should we explore more these as genes? I was thinking using the total number of reads (per Male and Female and found more genes (results are further down in the notebook). Please let me know if we should explore more these results

I also ran another script to merge the expression contigs to protein names in the notebook paper-pano-go/jupyter-nbs/08-Gene-expressionGeoannotation.ipynb There are only 36 and 44 contigs annotated for "Male Unique" and "Female Unique", respectively. I have uploaded the files paper-pano-go/jupyter-nbs/analyses/female_protein_names.csv paper-pano-go/jupyter-nbs/analyses/male_protein_names.csv I haven't done the annotation with C. gigas and the others (Dheilly, RuphiBase) in order to increase annotation. I hope to do it this week.

Take care Miguel