lgatto / pRoloc

A unifying bioinformatics framework for organelle proteomics
http://lgatto.github.io/pRoloc/
15 stars 13 forks source link

Biomart repository addition for Chinese Hamster #132

Closed OwenVennard closed 4 years ago

OwenVennard commented 5 years ago

Hi all,

I'm interested in utilising the transfer learning capabilities in pRoloc to perform some GO analysis etc (i.e. setAnnotationParams) for Chinese hamster (CHO-K1) cells. From what I understand, the biomaRt package queries the relevant database and retrieves this database. However this species is not currently viewable. I wondered if this could please be added to the list?

Below are species entries from biomart.

Organism: Chinese hamster CHOK1GS (Cricetulus griseus) (Cell line) Genome assembly: CHOK1GS_HDv1 GCA_900186095.1

There is also a separate Chinese hamster database (rather than the cell line).

Organism: Chinese hamster CriGri (CriGri_1.0) Genome assembly: CriGri_1.0 (GCA_000223135.1)

Apologies if in my naivety I'm missing something very obvious as to why this is not currently possible!

Thanks

lgatto commented 5 years ago

Could you point me to a URL to the CHO-K1 data.

I will have a look in the underlying code to see if I can update the list of biomaRt available databases. If this fails, you can always construct the auxiliary data by hand. It's simply and MSnSet with a binary assay data, with proteins as features (rows) and GO terms as samples (columns). For instance, if you have a csv file with the GO/proteins associations that you pulled from the database, you could create the MSnSet with readMSnSet2, as illustrated last week at the course.

Do ping me again here if you haven't heard from my by mid next-week.

OwenVennard commented 5 years ago

@lgatto Thanks for getting back in touch and the advice regarding constructing the data manually.

Here is the URL to the CHO-K1 entry, hopefully this is the right thing:

https://www.ensembl.org/Cricetulus_griseus_chok1gshd/Info/Index