Open apcamargo opened 3 years ago
Thank you for the remark. Answer from @soeding :
"Many Pfam domain families were founded when no structures of member was yet available. Oftentimes, the domain boundaries defined by sequence-based methods have been quite inaccurate, comprising fractions of a domain or domains-and-a-half etc. Pfam has historically be very slow in updating their Pfam family definition to harmonize with the domain boundaries elucidated by protein structure determination. Therefore, Pfam is less suited to determine boundaries of structural/functional domains than CATH / SCOP / ECOD based on the PDB."
Thanks! I got it now!
I managed to get the download links for the HHPred databases, so I can use SCOPe and ECOD now. Regardless, my only concern is that PDB contains some precursors and we shouldn't just expect the matches to be unit domains (unless you remove precursors and polyproteins from the database beforehand).
In the Wiki it is stated:
I'm not sure if PDB is a good choice for multi-domain proteins though, as it contains some unprocessed polyproteins that will usually have lower E-values than each individual domain (eg.: https://www.rcsb.org/structure/2IJD).
Also, is there any specific reason for Pfam to be less suitable for boundaries? I've been using it together with SCOP and got good results.