sr320 / paper-pano-go

Draft manuscript describing Panopea gonad transcriptome
2 stars 7 forks source link

Big Table Questions #30

Open sr320 opened 7 years ago

sr320 commented 7 years ago

Hi @mdelrio1

Thanks for updating the "Big Table"! I have a couple of questions, suggestions.

  1. Looks like annotation is duplicated? My pink on the far right is redundant with the columns with blue thin line? Could Blue line columns be removed.?
  2. I suggest removing columns with black lines.
  3. For expression I would only include columns you used, I believe unique not total?

bt

full-size snapshot

mdelrio1 commented 7 years ago

Hello Steven @sr320 1, 2. Yes, they are redundant columns, I was checking this early today. I´ll remove the blue and black marked columns

  1. yes, I only used unique, I´ll delete the total column too

Also, I was wondering whether we need to add to the table the data from the blast results Column5= pident Column6= length Column7= mismatch Column8= gapopen Column9= start Column10= qend Column11= sstart Column12= send in order to change those sequences that need to be as reverse complementary and place it in the fasta file. I have not been able to find these results in the repository. Only for the paper-pano-go/data-results/Geoduck-transcriptome-v2-GO-Slim.csv however, this file does not agree with Geoduck-transcriptome_v3.fa since there are some contigs which are not in both files (for instance comp100065_c0_seq1 is in v3, but not in v2-GO-Slim). do you have it? please let me know. Thank you Miguel

mdelrio1 commented 7 years ago

Hi @sr320 do you want me to delete the old big-table and replace it for the new one without the repeated columns and the other columns (black marked)?

sr320 commented 7 years ago

Sure - you could just overwrite it. GitHub will keep the older versions.

I do not think sequences need to be / should be flipped - Though we can discuss.

I will be at the Hatchery all day tomorrow so cannot make the call. Should we try to reschedule? Maybe 1pm Thursday?

mdelrio1 commented 7 years ago

Hi @sr320 I'm having a meeting in Thursday at 11:00, I hope it finishes before 1:00 pm, in case it doesn't could you please wait for me? Thanks I'll update the files.

mdelrio1 commented 7 years ago

files are uptodate paper-pano-go/jupyter-nbs/10panopeadataresults.ipynb [https://github.com/sr320/paper-pano-go/blob/master/jupyter-nbs/10panopeadataresults.ipynb] and paper-pano-go/data-results/Geoduck-transcriptome_v3_bigtable.csv.zip [https://github.com/sr320/paper-pano-go/blob/master/data-results/Geoduck-transcriptome_v3_bigtable.csv.zip]

mdelrio1 commented 7 years ago

Hi Steven @sr320 I'm trying to add the matching files in order to calculate the GC and CpG considering whether the sequence is in match (5'-3') or in reverse complimentary (3'-5') but the file paper-pano-go/jupyter-nbs/analyses/Geoduck-tranv3-blastx_sprot.sorted (where all the blast results are) does not match the paper-pano-go/data-results/Geoduck-transcriptome-v3.fa.zip please let me know if you have the file (I don't understand what happened when we talked about ir, sorry)

sr320 commented 7 years ago

@mdelrio1 I will take a look - I am at a conference today - (so cannot make the Skype call today - sorry I think I forgot to tell you before now).

Be in touch soon. thanks

mdelrio1 commented 7 years ago

@sr320 Don´t worry, let me know when you come back. Take care

sr320 commented 7 years ago

@mdelrio1 Can you clarify what you mean when you say?

paper-pano-go/jupyter-nbs/analyses/Geoduck-tranv3-blastx_sprot.sorted (where all the blast results are) does not match the paper-pano-go/data-results/Geoduck-transcriptome-v3.fa.zip

mdelrio1 commented 7 years ago

Hi @sr320 Please check paper-pano-go/jupyter-nbs/10Panopea_databases.ipynb https://github.com/sr320/paper-pano-go/blob/master/jupyter-nbs/10Panopea_databases.ipynb I think I optimise some cells and finally got the merging working properly! there are still some issues. I think we may reduce the number of columns and probably split the datafile into two: a) contigs with general information and expression levels (sex included) all 153982 b) blastx, gigaton, ruphi, Dh, and sex expression values only 22974 contigs this is in order to reduce the amount of empty cells. Check In[54] where is shows all the column names. I tried to reduce redundant columns but found that there are some missing data when I compared the columns with "UniProt_Acc","sseqid2", "SPID" data (last In, but clear it before uploading the file, sorry). I hope I can run it tomorrow morning and insert this information. Lets talk about it tomorrow.

sr320 commented 7 years ago

Sounds good - but I guess I do not see an issue with empty cells.

mdelrio1 commented 7 years ago

@sr320 the jupyter notebooks are ready https://github.com/sr320/paper-pano-go/blob/master/jupyter-nbs/10Panopea_databases.ipynb for the big and the small table. The tables themselves are in the data-results file https://github.com/sr320/paper-pano-go/tree/master/data-results I couldn't upload file bigger than 25MB, so I had to compress the bigtable file.

The small table notebook is for processing small table data for the Venn diagrams in R and python https://github.com/sr320/paper-pano-go/blob/master/jupyter-nbs/10Panopea_databases_smalltable.ipynb.

The Venn diagrams are in the manuscript figures file https://github.com/sr320/paper-pano-go/tree/master/manuscript/figures

I hope, I'm not missing anything we talked about, let me know