list ncRNAs in pdb structure views

pombase / website

PomBase website v2

MIT License

6 stars 1 forks source link

list ncRNAs in pdb structure views #2024

Open ValWood opened 1 year ago

ValWood commented 1 year ago

We list the proteins present in a structure, but not the ncRNAs. For example, this structure has U2 snRNA. I presume these have RNACentral IDs , so we might be able to get these IDs too from a mapping?

kimrutherford commented 1 year ago

In the data file it just has a "molecule name" for the RNAs. For your example it has:

U5 snRNA
U6 snRNA
RNA (5'-R(PGPUPAPUPGPUPAPU)-3')
RNA (5'-R(PUPUPUPAPUPAPCPUPAPAPCPAP*C)-3')
U2 snRNA

ValWood commented 1 year ago

I have mailed pdb and asked if they can include RNACentral IDs. @blakesweeney @afg1

kimrutherford commented 1 year ago

Hi @blakesweeney and @afg1

At the moment the data table we use is downloaded from the PDBe Advanced search after querying for genus "Schizosaccharomyces" and selecting all datatypes: https://github.com/pombase/pombase-chado/wiki/Updating-the-PDB-data-file

blakesweeney commented 1 year ago

If you would just like the Rfam connection then you can fetch that using the file we generate for PDBe each week. There is a file with the match between Rfam families and pdb chains at:

https://ftp.ebi.ac.uk/pub/databases/Rfam/.preview/pdb_full_region.txt.gz

I'm happy to explain what the file means if needed. The RNAcentral mappings are currently only updated at release time and present here:

https://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/id_mapping/database_mappings/pdb.tsv

We are working on updating some RNAcentral data weekly, but it is not yet publically avaiable.

kimrutherford commented 1 year ago

Thanks very much Blake. I think pdb.tsv is what we need because we have the URS IDs in PomBase. We'll give it a go.

ValWood commented 1 year ago

Actually it isn't the RFAm connection we need, it's the actual RNACentral ID which we would then use to identify the equivalent PomBAse entity.