Closed mdelrio1 closed 9 years ago
Hi Miguel- You are correct, the GO information in not included in those files yet. The following code should get GO and GO slim information in SQLShare. sqlshare.escience.washington.edu
SELECT * FROM [sr320@washington.edu].[Geoduck-tranv2-blastx_sprot]blast
left join
[sr320@washington.edu].[SPID and GO Numbers]go
on blast.Column3=go.SPID
left join [sr320@washington.edu].[GO_to_GOslim]slim
on go.GOID=slim.GO_id where aspect like 'P'
I am trying now on SQLShare to get the results- but it is currently still :runner:
Thanks Steven, I think it´s best to wait or should I run the code?
If you want to- go ahead and see if you can get results from your SQLShare account. On Wed, Nov 4, 2015 at 3:29 PM Miguel del Rio notifications@github.com wrote:
Thanks Steven, I think it´s best to wait or should I run the code?
— Reply to this email directly or view it on GitHub https://github.com/sr320/paper-pano-go/issues/10#issuecomment-153903805.
Steven Roberts http://faculty.washington.edu/sr320/
OK, I´ll run it. It's running
Hi Steven It finished, but I couldn't download the database (Geoduck-tranv2-GO ), I shared it with you, however it seems something is wrong, it has 100 Rows and 20 Columns! Could you please tell me what I did wrong. I'm attaching the snapshot of the run.n Thanks
From my end it looks like you did it correctly see screenshot- preview is 100 but there are 100k+ records.
I think there is a problem on the server side. I am trying to download but it is just "waiting".
I will let it keep going and let you know if I can get a successful download.
Thanks Steven I agree with you, in the screen shot I took there were 100 sequences, but know I just entered and there are "Rows 1 - 100 of 102358 “ as in the screen shot you sent me. but I can't download the files. I'll wait to. Thanks
Ok I think I finally got it... I ended up doing the two joins separately, then creating a "snapshot", before downloading...
The CSV file with GO and GO slim (BP only) information is now in the repo
Thanks Steven I manage to download the file this morning too, I got the same results with both files. This is the image for the annotation. Please let me know whether you prefer a different setting. https://github.com/mdelrio1/mdelrio-panopea1/blob/master/img/Panopea_annotation.png
I am confuse, are you running the annotation again? cause I am currently using the female and male data to generate the CpG's of share genes among them ... Is that ok? As I understand you are using "panopea" all data together for this annotation... right?
Steven, the image was with all annotated data, here it is the no-duplicate graph. https://github.com/mdelrio1/mdelrio-panopea1/blob/master/img/Panopea_annotationNoduplicates.png
@lafarga13 - go ahead and do the female and male data. That would be great. Once the analysis workflow is set (or we see something interesting with your analysis) we could do full transcriptome easily.
@mdelrio1 Looks good! "GO" ahead and add text, figures to main paper repo. You have write access so you can edit directly just as if it was one of your repos.
@sr320 I'll add the fig and write something in the paper repo. Thanks
@mdelrio1 do you have a table with unique GOslim information for each contig?
@sr320 Yes, I'll add it to the data-resuts as an excel file with two sheets, one with all information and the second with unique GOslim, unless you say otherwise.
@sr320 The file is in my repository https://github.com/mdelrio1/mdelrio-panopea1/blob/master/data/Geoduck-transcriptome-v2-GO-Slim.xlsx I couldn't upload it at the data folder.
@mdelrio1 Can you save as a CSV, then add? If not I can try.
NOTE you should probably rename with 'unique' as there is already a file with this name which has all GOslim info.
@sr320 I have uploaded the .csv file and rename it,
https://github.com/mdelrio1/mdelrio-panopea1/blob/master/data/Geoduck-transcriptome-v2-GO-SlimUnique.csv
it only has the unique GO results
Instead of adding, tried to word count the rows as
!wc ../panopea_data/data-results/Geoduck-transcriptome-v2-GO-SlimUnique.csv
but I've got
0 81687 3320985 ../panopea_data/data-results/Geoduck-transcriptome-v2-GO-SlimUnique.csv
zero rows? so in order to obtain the information I also tried
!grep -c "comp" ../panopea_data/data-results/Geoduck-transcriptome-v2-GO-SlimUnique.csv
thinking that all rows have a "comp" as part of the name, but it gave me `1
how do you count rows in the .csv files?
I have added that file to this repo.
What you have experienced just one of the side effects of using Excel :smile: . Saving as csv in Excel uses non unix line breaks.
I opened the csv up in TextWrangler and re-saved.
changed to Unix
and wc now indicates
wc -l /Users/sr320/git-repos/paper-pano-go/data-results/Geoduck-transcriptome-v2-GO-SlimUnique.csv
19652 /Users/sr320/git-repos/paper-pano-go/data-results/Geoduck-transcriptome-v2-GO-SlimUnique.csv
Again the location is now in data-results
@sr320 Thanks I was going to say that there seemed to be only one line!! thanks again I´ll work with TextWrangler
Hi Steven I´m trying to describe the annotation data, but have some problems. I have checked the files: a) Geoduck-transcriptome-v2.fasta b) Geoduck-tranv2-blastx_sprot.tab and in this file I did´t obtained the GO data. Could you please tell me the headers (I could´t find the file with the header of the columns, sorry) and how to obtain the GO information. Thanks
These are the first five rows of the Geoduck-tranv2-blastx_sprot.tab file.
comp95_c0_seq1 sp Q8K358 PIGU_MOUSE 67.53 77 25 0 231 1 258 334 7.00E-32 119 comp146_c0_seq1 sp P37137 LHX5_XENLA 79.31 58 12 0 175 2 4 61 4.00E-31 116 comp195_c0_seq1 sp Q8HXQ0 SODC_MACMU 54.84 62 28 0 188 3 80 141 1.00E-14 68.9 comp296_c0_seq1 sp P59966 DNAB_MYCBO 67.16 67 22 0 201 1 72 138 1.00E-11 63.2 comp434_c0_seq1 sp Q07954 LRP1_HUMAN 38.46 78 44 3 34 267 4094 4167 8.00E-10 59.3