Open zshuyinggg opened 1 year ago
need to know what structure/sequence they used to predict the complexes
In their predicted complexes folder: ENSG00000072682-ENSG00000183386, we can only find: ENSG00000072682-ENSG00000183386.overlap ENSG00000072682-ENSG00000183386.pdb ENSG00000072682-ENSG00000183386.pLDDT
Can we know the original structure used for each protein from these files? I also pulled out the description in their paper as below:
which i don't fully understand if they mentioned anything about the isoform
I found out a ENSG00000072682.pdb in the folder: Huri-single And I converted it to fasta, turned out they used ISOFORM 1 instead of ISOFORM 2
In the file huintaf2-main/data/HuRI-uniprot-mapping.csv
There are many mappings that mapped the ensemble to isoform-2 of a protein, e.g.
ENSG00000072682,O15460-2
However, in the file
HuRI-merged-allnames.csv
, there is no isoform specified for every protein-protein interaction: 542509,O15460-Q13643,ENSG00000072682,ENSG00000183386,6EVL;6EVM;6EVN;6EVO;6EVP;,1WYH;2CUQ;2EHE;,O15460,Q13643,ENSG00000072682-ENSG00000183386,O15460-Q13643