related-sciences / ensembl-genes

Extract the Ensembl genes catalog to simple tables
Other
17 stars 4 forks source link

Ensembl release 109 seq_region table needs repair #21

Closed dhimmel closed 1 year ago

dhimmel commented 1 year ago

When running ensembl_genes datasets --release=109, I'm getting the following error:

DatabaseError: (mysql.connector.errors.DatabaseError) 1194 (HY000): Table 'seq_region' is marked as crashed and should be 
repaired

This error occurred for when connecting to mysql+mysqlconnector://anonymous@ensembldb.ensembl.org:3306/homo_sapiens_core_109_38. See query causing error below:

Expand for query ```sql SELECT gene.stable_id AS ensembl_gene_id, gene.version AS ensembl_gene_version, -- gene symbol methods https://github.com/cogent3/ensembldb3/issues/7 -- Release 104 retired clone-based gene symbols, -- leading to ensembl genes without a symbol. Fill with the stable ID, -- as per https://www.ensembl.info/2021/03/15/retirement-of-clone-based-gene-names/ COALESCE(xref.display_label, gene.stable_id) AS gene_symbol, external_db.db_name AS gene_symbol_source_db, xref.dbprimary_acc AS gene_symbol_source_id, gene.biotype AS gene_biotype, gene.description AS gene_description, gene.source AS ensembl_source, gene.created_date AS ensembl_created_date, gene.modified_date AS ensembl_modified_date, coord_system.version AS coord_system_version, coord_system.name AS coord_system, -- get chromosome: refs internal Related Sciences issue 606. CASE WHEN coord_system.name = "chromosome" THEN COALESCE(exc_seq_region.name, seq_region.name) END AS chromosome, assembly_exception.exc_type AS seq_region_exc_type, seq_region.name AS seq_region, gene.seq_region_start AS seq_region_start, gene.seq_region_end AS seq_region_end, gene.seq_region_strand AS seq_region_strand, assembly_exception.exc_seq_region_id IS NULL AS primary_assembly FROM gene LEFT JOIN xref ON xref.xref_id = gene.display_xref_id LEFT JOIN external_db ON xref.external_db_id = external_db.external_db_id LEFT JOIN seq_region ON gene.seq_region_id = seq_region.seq_region_id LEFT JOIN coord_system ON seq_region.coord_system_id = coord_system.coord_system_id LEFT JOIN assembly_exception ON seq_region.seq_region_id = assembly_exception.seq_region_id -- keep exc_type in (PATCH_FIX, PATCH_NOVEL, HAP) -- refs internal Related Sciences issue 606. AND NOT assembly_exception.exc_type <=> "PAR" LEFT JOIN seq_region AS exc_seq_region ON assembly_exception.exc_seq_region_id = exc_seq_region.seq_region_id WHERE -- all genes were current when query was written, ensure this is always the case gene.is_current AND -- refs internal Related Sciences issue 289. gene.biotype != "LRG_gene" ORDER BY ensembl_gene_id ```

I believe this is an upstream issue entirely out of our hands, but wanted to document and report it.

jgtate commented 1 year ago

I can confirm that this was an issue with Ensembl, @dhimmel, caused by a background process that was running over on MySQL server. We're not entirely sure whether the process was responsible for actually crashing the table or if it was just giving that appearance, but we're chasing it down with our DBAs. As of right now the seq_region table seems to be fixed and usable. Please let us know via the Ensembl website if you still see problems though.

dhimmel commented 1 year ago

Awesome! Thanks for the info @jgtate. Confirming that we're no longer getting this error so the table is healthy.

Sounds like if we see this error in the future on other tables, it might be worth waiting a bit for it to automatically resolve if its due to an ongoing background process.

jgtate commented 1 year ago

We'll look at moving this step so it doesn't happen at this point in the release process – we've not seen this behaviour before but it's something we should be able to avoid by not running it too close to the release. As a rule of thumb, however, things can be a bit rocky on release day itself. If you see issues like this it's worth waiting 24 hours if you can, then trying again. If it's still broken at that point by all means let us know!