mildpiggy / DEP2

An r package for proteomics data Analysis, developed from DEP.
Other
15 stars 3 forks source link

merging ```SummarizedExperiment``` objects #17

Open cathalgking opened 2 months ago

cathalgking commented 2 months ago

I use the DEP2 package and create 6 SE objects containing proteomic data. I have analysed each one individually with DEP2. I want to merge them now for more analysis but am getting an error related to the features in my data:

> merged_se <- cbind(plate1,plate2)
Error in FUN(X[[i]], ...) : 
  column(s) 'PG.ProteinGroups' in ‘mcols’ are duplicated and the data do not match

My data contains PG.ProteinGroups in the rowData slot of the object which seems to cause some of the issue.

> plate1
class: SummarizedExperiment 
dim: 5504 78 
metadata(0):
assays(1): ''
rownames(5504): TMA7B IGKV3-7 ... MORC2 SEC23IP
rowData names(19): PG.ProteinGroups PG.Genes ... name ID
colnames(78): patient_1 patient_2 ... patient_77 patient_78
colData names(11): label ID ... sex age_sampling
> plate2
class: SummarizedExperiment 
dim: 6194 77 
metadata(0):
assays(1): ''
rownames(6194): TMA7B NUDT4B ... SEC23IP COLEC10
rowData names(19): PG.ProteinGroups PG.Genes ... name ID
colnames(77): patient_1 patient_2 ... patient_76 patient_77
colData names(11): label ID ... age_sampling ordn
> str(rowData(plate1))
Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  ..@ rownames       : chr [1:5504] "TMA7B" "IGKV3-7" "IGLV4-69" "IGLV8-61" ...
  ..@ nrows          : int 5504
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()
  ..@ listData       :List of 19
  .. ..$ PG.ProteinGroups      : chr [1:5504] "A0A024R1R8;Q9Y2S6" "A0A075B6H7" "A0A075B6H9" "A0A075B6I0" ...
  .. ..$ PG.Genes              : chr [1:5504] "TMA7B;TMA7" "IGKV3-7" "IGLV4-69" "IGLV8-61" ...
  .. ..$ PG.ProteinDescriptions: chr [1:5504] "Translation machinery-associated protein 7B;Translation machinery-associated protein 7" "Probable non-functional immunoglobulin kappa variable 3-7" "Immunoglobulin lambda variable 4-69" "Immunoglobulin lambda variable 8-61" ...
  .. ..$ PG.ProteinNames       : chr [1:5504] "TMA7B_HUMAN;TMA7_HUMAN" "KV37_HUMAN" "LV469_HUMAN" "LV861_HUMAN" ...
  .. ..$ QC101                 : num [1:5504] 0 0 5.8 54.52 9.06 ...
  .. ..$ QC201                 : num [1:5504] 0 0 0 45.7 0 ...
  .. ..$ RC101                 : num [1:5504] 0 0 7.42 34.54 0 ...
  .. ..$ RC201                 : num [1:5504] 0 0 4.19 41.29 0 ...
  .. ..$ RC301                 : num [1:5504] 0 0 9.4 0 0 ...
  .. ..$ RCV01                 : num [1:5504] 0 0 13 43.3 0 ...
  .. ..$ RCV02                 : num [1:5504] 0 0 11.16 51.34 8.53 ...
  .. ..$ RCV03                 : num [1:5504] 0 0 13.6 50.9 0 ...
  .. ..$ RCV04                 : num [1:5504] 0 0 9.81 43.36 0 ...
  .. ..$ RCV05                 : num [1:5504] 0 0 7.5 45.8 0 ...
  .. ..$ RCV06                 : num [1:5504] 0 0 8.65 49.85 0 ...
  .. ..$ RCV07                 : num [1:5504] 0 0 5.91 46.89 0 ...
  .. ..$ RCV08                 : num [1:5504] 0 0 8.42 55.11 0 ...
  .. ..$ name                  : chr [1:5504] "TMA7B" "IGKV3-7" "IGLV4-69" "IGLV8-61" ...
  .. ..$ ID                    : chr [1:5504] "A0A024R1R8" "A0A075B6H7" "A0A075B6H9" "A0A075B6I0" ...
> str(rowData(plate2))
Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  ..@ rownames       : chr [1:6194] "TMA7B" "NUDT4B" "IGKV3-7" "IGLV4-69" ...
  ..@ nrows          : int 6194
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()
  ..@ listData       :List of 19
  .. ..$ PG.ProteinGroups      : chr [1:6194] "A0A024R1R8;Q9Y2S6" "A0A024RBG1;Q9NZJ9" "A0A075B6H7" "A0A075B6H9" ...
  .. ..$ PG.Genes              : chr [1:6194] "TMA7B;TMA7" "NUDT4B;NUDT4" "IGKV3-7" "IGLV4-69" ...
  .. ..$ PG.ProteinDescriptions: chr [1:6194] "Translation machinery-associated protein 7B;Translation machinery-associated protein 7" "Diphosphoinositol polyphosphate phosphohydrolase NUDT4B;Diphosphoinositol polyphosphate phosphohydrolase 2" "Probable non-functional immunoglobulin kappa variable 3-7" "Immunoglobulin lambda variable 4-69" ...
  .. ..$ PG.ProteinNames       : chr [1:6194] "TMA7B_HUMAN;TMA7_HUMAN" "NUD4B_HUMAN;NUDT4_HUMAN" "KV37_HUMAN" "LV469_HUMAN" ...
  .. ..$ QC101                 : num [1:6194] 0 0 2763.8 17.6 37.5 ...
  .. ..$ QC201                 : num [1:6194] 0 0 2460.78 4.12 74.75 ...
  .. ..$ RC101                 : num [1:6194] 0 0 2571.3 11 45.2 ...
  .. ..$ RC201                 : num [1:6194] 0 0 2992.9 9.4 54.8 ...
  .. ..$ RC301                 : num [1:6194] 0 0 3027.37 7.72 62.6 ...
  .. ..$ RCV01                 : num [1:6194] 0 0 3640.2 0 16.3 ...
  .. ..$ RCV02                 : num [1:6194] 0 0 3643.6 0 12.8 ...
  .. ..$ RCV03                 : num [1:6194] 0 0 3095 0 0 ...
  .. ..$ RCV04                 : num [1:6194] 0 0 3517 0 15.7 ...
  .. ..$ RCV05                 : num [1:6194] 0 0 3874.7 0 33.6 ...
  .. ..$ RCV06                 : num [1:6194] 0 0 4023.8 0 29.4 ...
  .. ..$ RCV07                 : num [1:6194] 0 0 3116 0 18.1 ...
  .. ..$ RCV08                 : num [1:6194] 0 0 4011.4 0 26.8 ...
  .. ..$ name                  : chr [1:6194] "TMA7B" "NUDT4B" "IGKV3-7" "IGLV4-69" ...
  .. ..$ ID                    : chr [1:6194] "A0A024R1R8" "A0A024RBG1" "A0A075B6H7" "A0A075B6H9" ...

Do I have to have common rownames in all SE's before merging? Ideally I want to retain as much data as possible. Or is there any other way to combine plates for analysis with DEP2? Thanks @mildpiggy

cathalgking commented 1 month ago

Hi @mildpiggy Do you have an opportunity to look at this issue? Thanks

mildpiggy commented 4 weeks ago

Hey @cathalgking, I am coming. Merging SE objects directly through cbind is not feasible. There was some functions to merge or combine two SE. May be you can try mergeSEs in SEtools. If you are unable to achieve your desired merging using the existing functions and don't mind, you can send me some demo subset of your data. I would try to write a merge function based on your data.

cathalgking commented 3 weeks ago

Hi @mildpiggy I can send you a subset of data over email. Can you send me on your email address please?

mildpiggy commented 3 weeks ago

@cathalgking My email is feng.zhenhuan@foxmail.com.