lgatto / pRolocdata

Data accompanying the pRoloc package
5 stars 5 forks source link

Add data from Mann et al 2018 #36

Closed lmsimp closed 5 years ago

lmsimp commented 5 years ago

Add data from Mann's recent publication on PCP phosphoproteomics.

Data is in table S3. A quick look at a the protein level data -

mann-pca mann-tsne

NB: Shape of PCA attributed to lots of missing values and heavy use of zero imputation.

lgatto commented 5 years ago

The data and files should be called krahmer2018, to fit with what has been done so far.

lmsimp commented 5 years ago

The data and files should be called krahmer2018, to fit with what has been done so far.

I hadn't got as far as picking a name but yes. I called them Borner and Mann above to draw attention to the group.

lmsimp commented 5 years ago

Feel free to add if you have time, otherwise I will do it when I get a chance.

ococrook commented 5 years ago

I'll try and add the data this week

lgatto commented 5 years ago

Thanks @ococrook

ococrook commented 5 years ago

pull request made for half the data (no phospho just yet) ... I could not find a marker list @lmsimp did you come across one?

lmsimp commented 5 years ago

@ococrook There must one as the markers I used above are not ours (we don't have a sub-compartment called LD). I'll have a quick look and see where I found this.

lmsimp commented 5 years ago

Sorry @ococrook I'm not at work so can't access Cell. I can check Monday?

ococrook commented 5 years ago

Looking at the Phosphodata from this paper, peptide data is provided with quite a lot of information - such as which residue was phosphoralated, multiplicity, protein group. I'm wondering what data we would provide - should we provide a dataset with peptide identifier as rownames or protein group as rownames. In the latter case, how do we collapse the quantitative data? How do we handle that different residues might be phosphoralated?

lmsimp commented 5 years ago

Is there quantitation data at both the peptide and the protein level?

lmsimp commented 5 years ago

Or are they the same in this case? And what you have is peptide and protein meta data? Can you provide a link to the .csv file so we can take a look?

ococrook commented 5 years ago

Is there quantitation data at both the peptide and the protein level?

just the peptide data. We often have the case that protein A appear twice, once with peptide A1 and onces with peptide A2 but peptide A1 is phosphoralted differently to peptide A2 so it can't be the same instance of the same protein. (a different phosopho of the same protein)

on the krahmer2018 branch: https://github.com/lgatto/pRolocdata/blob/krahmer2018/inst/extdata/krahmer2018PhosphoPcp.csv

lgatto commented 5 years ago

Phospho-date is better analysed at the peptide level anyway, so I would provide that. The aggregation into proteins will be a matter of choice depending on application. As for feature names, I would recommend peptide sequence, either passed through make.unique or by prefixing the master protein name.

ococrook commented 5 years ago

@lgatto I think we can close this one?