Closed lmsimp closed 5 years ago
The data and files should be called krahmer2018
, to fit with what has been done so far.
The data and files should be called
krahmer2018
, to fit with what has been done so far.
I hadn't got as far as picking a name but yes. I called them Borner and Mann above to draw attention to the group.
Feel free to add if you have time, otherwise I will do it when I get a chance.
I'll try and add the data this week
Thanks @ococrook
pull request made for half the data (no phospho just yet) ... I could not find a marker list @lmsimp did you come across one?
@ococrook There must one as the markers I used above are not ours (we don't have a sub-compartment called LD). I'll have a quick look and see where I found this.
Sorry @ococrook I'm not at work so can't access Cell. I can check Monday?
Looking at the Phosphodata from this paper, peptide data is provided with quite a lot of information - such as which residue was phosphoralated, multiplicity, protein group. I'm wondering what data we would provide - should we provide a dataset with peptide identifier as rownames or protein group as rownames. In the latter case, how do we collapse the quantitative data? How do we handle that different residues might be phosphoralated?
Is there quantitation data at both the peptide and the protein level?
Or are they the same in this case? And what you have is peptide and protein meta data? Can you provide a link to the .csv file so we can take a look?
Is there quantitation data at both the peptide and the protein level?
just the peptide data. We often have the case that protein A appear twice, once with peptide A1 and onces with peptide A2 but peptide A1 is phosphoralted differently to peptide A2 so it can't be the same instance of the same protein. (a different phosopho of the same protein)
on the krahmer2018 branch: https://github.com/lgatto/pRolocdata/blob/krahmer2018/inst/extdata/krahmer2018PhosphoPcp.csv
Phospho-date is better analysed at the peptide level anyway, so I would provide that. The aggregation into proteins will be a matter of choice depending on application. As for feature names, I would recommend peptide sequence, either passed through make.unique
or by prefixing the master protein name.
@lgatto I think we can close this one?
Add data from Mann's recent publication on PCP phosphoproteomics.
Data is in table S3. A quick look at a the protein level data -
NB: Shape of PCA attributed to lots of missing values and heavy use of zero imputation.