lgatto / pRolocdata

Data accompanying the pRoloc package
5 stars 5 forks source link

Krahmer 2018 #39

Closed lgatto closed 5 years ago

lgatto commented 5 years ago

There are two issues with the krahmer2018pcp, but wanted to check before doing the changes.

1) The data isn't normalised, and hence the PCA plot isn't readable. On the left is the figure as produced by the example.

> plot2D(krahmer2018pcp, fcol = "Organelle")
> plot2D(normalise(krahmer2018pcp, method = "sum"), fcol = "Organelle")

rplot001

We should either normalise it in the example code or, and that would be my preference, provide the normalised data. If the current non-normalised intensities are needed, we could consider providing two data sets, krahmer2018pcp and krahmer2018pcpRaw.

2) Currently, the annotations aren't in the markers variable, which makes for more annoying typing. In addition, the absence of subcellular annotation is coded as an empty string rather than "unknown" the convention used throughout the package, which leads to colouring these with red.

> krahmer2018pcp <- fDataToUnknown(krahmer2018pcp, fcol = "Organelle")
> fd <- fData(krahmer2018pcp)
> fd$markers <- fd$Organelle
> fData(krahmer2018pcp) <- fd
> krahmer2018pcp <- normalise(krahmer2018pcp, method = "sum")
> plot2D(krahmer2018pcp)

rplot001

@ococrook - what do you think?

ococrook commented 5 years ago

I was thinking about both these things. I had convinced myself both ways about normalisation and but now think the default you be normalised. If, for whatever reason, we need the unnormalised data we can regenerate it in the future.

The reason for not including it in the the marker variable because I wasn't sure if those were the markers or not ... in fact I'm still unsure if the data they provide as organelle are what we refer to as markers and the other column correspond to allocations.

Happy to change to your recommendation and I'll do the same for the phosphopcp data when I upload it.

lgatto commented 5 years ago

Yes, I would suggest to implement the changes.

Also, for the pheno data, it would be useful to extract additional information, for example

> pData(krahmer2018pcp)$fraction <- as.numeric(sub("^.+FR", "", sampleNames(krahmer2018pcp)))
> pData(krahmer2018pcp)$replicate <- as.numeric(sub("^.+_", "", sub("_FR.+$", "", sampleNames(krahmer2018pcp))))
> head(pData(krahmer2018pcp))
                                           toName fraction replicate
LFQ.intensity.LFD_1_FR22 LFQ.intensity.LFD_1_FR22       22         1
LFQ.intensity.LFD_1_FR21 LFQ.intensity.LFD_1_FR21       21         1
LFQ.intensity.LFD_1_FR20 LFQ.intensity.LFD_1_FR20       20         1
LFQ.intensity.LFD_1_FR19 LFQ.intensity.LFD_1_FR19       19         1
LFQ.intensity.LFD_1_FR18 LFQ.intensity.LFD_1_FR18       18         1
LFQ.intensity.LFD_1_FR17 LFQ.intensity.LFD_1_FR17       17         1

There's also

> ifelse(grepl("LFD", sampleNames(krahmer2018pcp)), "LFD", "HFD3")
  [1] "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD" 
 [11] "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD" 
 [21] "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD" 
 [31] "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD" 
 [41] "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD" 
 [51] "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD" 
 [61] "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "LFD"  "HFD3" "HFD3" "HFD3" "HFD3"
 [71] "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3"
 [81] "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3"
 [91] "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3" "HFD3"
 [ reached getOption("max.print") -- omitted 98 entries ]

All these changes need to be added in inst/scripts/krahmer2018pcp.R, then running that script will update the object in data.

lgatto commented 5 years ago

Thank you for the PR. Another question about the files in inst/extdata:

krahmer2018pcp.csv
krahmer2018pcpFeature.csv
krahmer2018pcp.txt
krahmer2018pcp.xlsx
krahmer2018PhosphoPcp.xlsx

Could you document them in inst/extdata/README. The syntax is org-more. Use * for first level section, ** for second, .... I assume some are from the paper's SI, some might have been converted (xlsx to text-based spreadsheet for example), so that we can track how the actual final data we have in the package all the way back to what the authors provided.