tripal / t4d8

This is a temporary repository for Drupal 9 upgrade for Tripal v4. It is meant to house issues related to the upgrade.
GNU General Public License v2.0
1 stars 10 forks source link

Tripal Importer and OBO Loader #230

Closed spficklin closed 2 years ago

spficklin commented 2 years ago

Issue #109

This PR contains fully functional TripalImporter class and the OBO Loader. I picked up from @risharde on Friday form his test/loader_compiled_with_gff_upgrade branch. He indicated this was the most up-to-date branch for the loader. I picked this up because we are in need of being able to load ontology terms to make fields.

Work has gone into the tv4g4-4-tripalImporterReviewed and I'm not sure what is fully in that branch but I assume it overlaps with this one. As @risharde was working on the test/loader_compiled_with_gff_upgrade that's where I started from. I suspect this PR will now conflict with that tv4g4-4-tripalImporterReviewed and it cannot be merged. But I think some of it was already in the test/loader_compiled_with_gff_upgrade so we may be okay. @risharde thoughts?

What's Changed?

This PR contains the following updates to the tv4g4-4-tripalImporterReviewed branch.

  1. First I renamed the branch to match the naming conventions. It's now tv4g4-109-OBO_Loader
  2. I merged the most up-to-date 9.x-4.x branch. This includes the IDSpace, Vocabulary, ChadoStorage and updated ChadoPrepare Task.
  3. I removed all updates from that branch that did not specifically deal with ontologies or importers. That way this PR should only have changes the importers and related to getting ontologies into Chado. It was just small things that I think we should catch again when we do those other parts.
  4. I converted the TripalImporter from a stand-alone class (in the tripal/includes directory) to a proper plugin. The importer Plugin is here: tripal/src/TripalImporter. It has the propert Annotion, Interfaces and PluginManager directories.
  5. The static member variables in the old class that set the way the importers work (e.g. button_text, valid file extensions, etc.) are now part of the TripalImporter plugin annotations and are no longer static variables. Anyone who creates a new importer will need to use the annotations to set those values.
  6. I added a TripalImporterBase class (also found in tripal/src/TripalImporter). All importers should extend this class.
  7. I added a ChadoImporterBase class (found in tripal_chado/src/TripalImporter). All Chado-specific importers should extend this class.
  8. In Tripal v3 there was code in the TripalImporter class that was querying Chado to get the analysis. That was bad, as it shouldn't have been querying Chado in the Tripal core module. It no longer is the case. There is an abstract function in the interface for providing the analysis selection form elements and the ChadoImporterBase class provides that for all Chado-specific importers.
  9. In order to pick up all of the most recent updates that were made to the OBO loader in the Tripal v3 version (after we branched to start Tripal v4) I copied fresh the OBO loader from Tripal v3 and rewrote it for Tripal v4. This means it's different from what @risharde had started on. I was worried if I tried to spot check changes I would miss something (sorry @risharde for redoing that work).
  10. Because the ChadoRecord class is deprecated, I replaced all calls to ChadoRecord that were in the OBO loader and the TripalImporter class with Tripal DBX calls.
  11. I updated all of the terminal printing and messing to use the TripalLogger.
  12. I updated the forms to use the Drupal 9 way.
  13. I included all of the other Chado-specific loaders (e.g. GFF3 Loader) just for testing purposes. I wanted to make sure they all showed up in the list of Importers at Administration >> Tripal >> Data Loaders but they are not properly coded up. I would recommend we do the same as with the OBO loader and copy those over from Tripal v3 to pick up any recent changes rather than try to fix what's there.
  14. I updated the ChadoPrepare Task to add default vocabularies and call the OBO Loader for them. This is the only change to the step for preparing Chado.

How to Test

Edit: Originally the other loaders (e.g. GFF, Newick) were part of this PR but I have removed them and will create separate issues and branches and add the code for those into separate branches.

There are no functional tests for this PR. But, you can test it by

  1. Update to this branch
  2. Uninstall tripal and tripal_chado modules or start from a brand new installation. This will install the tripal_cv_obo table.
  3. Install the Chado database if it's not there.
  4. Testing the importer via the Prepare Chado step.
    • Run the "Prepare Chado" step. This should install the ontologies. You will see output on the screen indicating which ontology is being installed. This should include the Sequence Ontology which takes less that 5 minutes on my machine.
    • Check the Chado database. You should see entries in the cv, db, dbxref, cvterm, cvterm_dbxref, cvterm_relationship, cvtermsynonym and cvtermprop tables.
    • Check the tripal_cv_obo table you should see several ontologies listed.
  5. Testing the install via the online interface.
    • Optionally. clean out the cvterm and dbxref tables if you want to make sure terms are re-added.
    • Go to the Administration >> Tripal >> Data Loaders page. You should see all of our normal loaders listed.
    • Click the OBO Vocabulary Loader.
    • Select any vocabulary in the list, and click the Import OBO File button at the bottom. It should give you a command-line command to run the job. Run it.
    • If you deleted records in the cvterm and dbxref tables then recheck all the tables to make sure the terms were added.
    • Test the other functions of the online form by editing an existing OBO or deleting one (deleting an OBO is a new function added in this PR).
  6. Testing a larger ontology. If you want to test loading a larger ontology, choose the Gene Ontology and load that one. It does not get loaded during installation. It takes about 15 minutes to load on my machine.
spficklin commented 2 years ago

On second thought. I'm going to remove the other Importers and create new branches, one for each one and create issues for them. I'm thinking f anyone pulls the 9.x-4.x branch for testing or development they should only have working stuff.

risharde commented 2 years ago

@spficklin hoping this isn't a major issue, testing the prepare step on this branch shows the following errors. The steps after "Loading ontologies" don't show errors.

image

@spficklin further resolved the above warnings!

  1. Prepare step seemed successful after checking the following:

[x] Check the Chado database. You should see entries in the cv, db, dbxref, cvterm, cvterm_dbxref, cvterm_relationship, cvtermsynonym and cvtermprop tables. [x] Check the tripal_cv_obo table you should see several ontologies listed. image

  1. UI Test was successful - Gene Ontology load took approximately 12 minutes on my development environment which is great!

UI Check: Optionally. clean out the cvterm and dbxref tables if you want to make sure terms are re-added. Go to the Administration >> Tripal >> Data Loaders page. You should see all of our normal loaders listed. Click the OBO Vocabulary Loader. Select any vocabulary in the list, and click the Import OBO File button at the bottom. It should give you a command-line command to run the job. Run it. If you deleted records in the cvterm and dbxref tables then recheck all the tables to make sure the terms were added. Test the other functions of the online form by editing an existing OBO or deleting one (deleting an OBO is a new function added in this PR).

@laceysanderson awaiting your review!

spficklin commented 2 years ago

Thanks @risharde for the quick testing. I fixed the issues you noticed with the warnings.

laceysanderson commented 2 years ago

For my environment I created a randomly named chado schema install. There was no "chado" named schema. I use this approach to test that no code makes any schema assumptions.

✅ The prepare step worked perfectly! No errors when I ran the job and all the tables matched my expectations. ❌ Next I tried the interface for importing an OBO and noticed right away that there is no way to indicate the schema... As expected, it failed immediately because it was unable to find the cv table.

2022-07-04 17:56:22
Tripal Job Launcher
Running as user 'drupaladmin'
-------------------
2022-07-04 17:56:22: Job ID 2.
2022-07-04 17:56:22: Calling: tripal_run_importer(5)
Running 'OBO Vocabulary Loader' importer
 [notice] Running 'OBO Vocabulary Loader' importer
NOTE: Loading of this file is performed using a database transaction. If it fails or is terminated prematurely then all insertions and updates are rolled back and will not be found in the database
 [notice] NOTE: Loading of this file is performed using a database transaction. If it fails or is terminated prematurely then all insertions and updates are rolled back and will not be found in the database
ERROR: SQLSTATE[42P01]: Undefined table: 7 ERROR:  relation "cv" does not exist
LINE 1: SELECT * FROM "cv" CV
                      ^: SELECT * FROM "cv" CV; Array
(
)

 [error]  ERROR: SQLSTATE[42P01]: Undefined table: 7 ERROR:  relation "cv" does not exist
LINE 1: SELECT * FROM "cv" CV
                      ^: SELECT * FROM "cv" CV; Array
(
)

[site http://default] [TRIPAL] ERROR: SQLSTATE[42P01]: Undefined table: 7 ERROR:  relation "cv" does not existLINE 1: SELECT * FROM "cv" CV                      ^: SELECT * FROM "cv" CV; Array()

This is an easy fix though so I can jump in and adjust it since @spficklin is on holidays :-)

risharde commented 2 years ago

@laceysanderson should I retest this one? Seems you made some changes but I didn't see any further comments here, please advise when you get a chance, thanks!

laceysanderson commented 2 years ago

Thanks for checking in @risharde. No it's not quite ready to retest. I'm running into some issues with multiple schema that I need to resolve first -hopefully this week! I'll let you know when it's ready for retest :-)

risharde commented 2 years ago

Thanks @laceysanderson, no problem, tag me when you're ready of course!

On Mon, Jul 18, 2022 at 11:23 AM Lacey-Anne Sanderson < @.***> wrote:

Thanks for checking in @risharde https://github.com/risharde. No it's not quite ready to retest. I'm running into some issues with multiple schema that I need to resolve first -hopefully this week! I'll let you know when it's ready for retest :-)

— Reply to this email directly, view it on GitHub https://github.com/tripal/t4d8/pull/230#issuecomment-1187636917, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICMFI4HF5I4BPIIXGTGAQLVUVZGNANCNFSM5Z5E5EIQ . You are receiving this because you were mentioned.Message ID: @.***>

laceysanderson commented 2 years ago

Ok, here is a summary of where I am at: These are my specific commits

More detail on the current problem:

The prepare step was failing at the point of trying to save the new NCIT:Subgroup term to it's idspace. Specifically, $ncit_idspace->saveTerm($ncit__subgroup); failed with the following error:

2022-07-18 17:17:54: Calling: tripal_chado_prepare_chado(chadobooboo)
 [notice] Creating Tripal Materialized Views and Custom Tables...
 [notice] Loading ontologies...
 [error]  SQLSTATE[23502]: Not null violation: 7 ERROR:  null value in column "db_id" violates not-null constraint
DETAIL:  Failing row contains (3, null, C25693, , null).: INSERT INTO "chadobooboo"."dbxref" ("db_id", "accession") VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1) RETURNING dbxref_id 
 [error]  Message: Job execution failed: Failed to complete schema integration task.
SQLSTATE[23502]: Not null violation: 7 ERROR:  null value in column "db_id" 
violates not-null constraint
DETAIL:  Failing row contains (3, null, C25693, , null).: INSERT INTO 
"chadobooboo"."dbxref" ("db_id", "accession") VALUES 
(:db_insert_placeholder_0, :db_insert_placeholder_1) RETURNING dbxref_id

As you can see it is using the correct schema ('chadobooboo') for the insert but does not have the right parameter list (4 parameters including nulls where there were only 2 placeholders).

When I looked into the ChadoIdSpace->saveTerm() method, it was assuming it could get the db/cv but those were returning null. I checked my chado database and as expected those were not inserted into chado yet.

This made me go back to the prepare class and look a bit higher where the vocab should have been created: $ncit_vocab = $this->getVocabulary('ncit');. The ChadoPreparer->getVocabulary() method uses the TripalCollectionPluginManager->load/createCollection methods. My next step would be to check these further and see if they are failing to create the cv/db in the chado database for non-default chado schema.

spficklin commented 2 years ago

Okay, the issue pointed out in @laceysanderson 's most recent comment has been fixed. The problem was that the plugin manager for ID spaces/vocabularies does not call the create function if it already knows about the IDSpace or Vocabulary. The create function adds a record to Chado. So, on the prepare of the first Chado schema it calls create but on the prepare of the second Chado schema it doesn't call create because it already knows about it.

I fixed it by adding a recordExists function to the interface for the collections. The plugin manager will then call that function and if the record doesn't exist (either on the create, load or delete) then it will act accordingly.

@laceysanderson it is ready for you to check that the issue goes away for you.

spficklin commented 2 years ago

This is a huge addition!! I'm excited to see this merged. Thanks @laceysanderson for the review!