lgatto / Pbase

Manipluating and exploring protein and proteomics data
8 stars 3 forks source link

Document non-unique peptide hits #16

Open lgatto opened 9 years ago

lgatto commented 9 years ago

We need a description of how non-unique peptides are handled.

@sgibb for now, as far as I can see in the p example data, non-unique matches are not recorded at all (see P04075 and P04075-2 for example).

What would we want? May be something like

Prot1                    isUnique                   otherProt     otherPep
  + pep 1                 TRUE                         NA            NA
  + pep 2                FALSE                         2              1

Prot1                    isUnique                   otherProt     otherPep
  + pep 2                FALSE                         1              1
  + pep 1                 TRUE                         NA            NA
sgibb commented 9 years ago

At least the isUnique column is already there. By calling proteotypic all unique peptides are marked as Proteotypic = TRUE and a column is added to the pmetadata slot:

lapply(pmetadata(proteotypic(p))[c("P04075", "P04075-2")], "[", "Proteotypic")
$P04075
DataFrame with 21 rows and 1 column
    Proteotypic
          <Rle>
1         FALSE
2         FALSE
3         FALSE
4         FALSE
5         FALSE
...         ...
17        FALSE
18        FALSE
19         TRUE
20        FALSE
21        FALSE

$`P04075-2`
DataFrame with 20 rows and 1 column
    Proteotypic
          <Rle>
1         FALSE
2         FALSE
3         FALSE
4         FALSE
5         FALSE
...         ...
16        FALSE
17        FALSE
18        FALSE
19        FALSE
20        FALSE

The columns otherProt and otherPep would be difficult (maybe we could use the MSnbase:::utils.vec2ssv function) because multiple assignments could happen (e.g. peptide 1 of protein 1 is also present in protein 2, 3, 10, ...;).

lgatto commented 9 years ago

I had forgotten about proteotypic, thanks. I think I will add a call to proteotipic inside the constructor.

Yes, the otherPep and otherProt require something a bit more elaborate (a dedicated class ?), rather than another column.