tripal / tripal

The Tripal package is a suite of Drupal modules for creating biological (genomic, genetic, breeding) websites. Visit the Tripal homepage at http://tripal.info for documentation, support, and other information. The Drupal project page is at http://drupal.org/project/tripal.
GNU General Public License v2.0
66 stars 49 forks source link

Gene entity Display : Issue with Tripal Layout #1810

Open SalihaZ opened 6 months ago

SalihaZ commented 6 months ago

Discussion or Question

Hi, I'm new to Drupal 10 and Tripal 4. I've recently tested Tripal Layout for our laboratory project website.

It worked perfectly to display organism and analysis page but I encountered issues to display some of the genes informations.

In fact, gene entities that are on the first page does not display any information, while for those in the following pages the display works well.

Capture d’écran du 2024-03-13 11-45-14

When I edited the informations of the concerned genes I noticed that the mandatory fields like "organism", “Unique Name“ and “Database Reference Annotations” were not filled.

Capture d’écran du 2024-03-13 11-46-11

I filled in the missing fields but encountered an error (Exception: Cannot update record in the Chado "feature" table due to unset conditions), and the gene title disappeared.

Here are some screenshots that shows the display after filling the mandatory fields

Capture d’écran du 2024-03-13 11-26-26 Capture d’écran du 2024-03-13 12-20-24 Capture d’écran du 2024-03-13 12-20-53

Could you please help me understand where this issue could be coming from and how it's possible to resolve it.

Thank you in advance for your help.

Saliha

dsenalik commented 6 months ago

There could easily be a bug we need to fix. Can you let us know

If we can reproduce the error it will be much easier to find out what is wrong. On Drupal 10.1 and PHP 8.2 with the alpha2 release of Tripal I was able to manually enter and save a gene. palmgene1

SalihaZ commented 6 months ago

Thanks for your reply.

I'm using Drupal 10 PHP 8 and the latest alpha3 version of Tripal which includes Tripal Layout, and I've followed Stephen's recommendations (https://github.com/tripal/tripal/pull/1792). The fasta and gff3 files have been loaded using the built-in dataloaders, and Bootstrap 5 theme has been applied.

dsenalik commented 6 months ago

Okay, it may be in the loader then. Can you supply a couple of genes from the fasta and gff3 files for us to test? But it may also be in the tripal layout module. For some reason I can't turn that on... 2024-03-13_tripal_layout_uncheckable Oh it depends on field_group This is fixed in merged PR #1814

dsenalik commented 6 months ago

drupalver = 10.1 phpver = 8.2 This message has been edited to document how to install the field_group and field_group_table modules https://www.drupal.org/project/field_group https://www.drupal.org/project/field_group_table

cd /var/www/drupal
composer require 'drupal/field_group:^3.4'
composer require 'drupal/field_group_table:^1.1'
SalihaZ commented 6 months ago

Here is a sample of gff and fasta I used sample.zip

I didn't encounter any issues with the installation of the field_group and field_group_table modules. Capture d’écran du 2024-03-13 17-23-06

dsenalik commented 6 months ago

My testing procedure

This happens because the record was not first deleted by chado storage in tripal_chado/src/TripalStorage/ChadoRecords.php function deleteRecords() 2024-03-13_no_pkey_alias_value

This in turn happens because in the load, note that gene_sequence_coordinates is not present in the request to loadValues(). It was present in validate earlier. 2024-03-13_no_gene_sequence_coordinates

Update: This has been fixed in core Tripal by merged PR #1861

SalihaZ commented 6 months ago

Thank you very much for your explanations.

Now I understand the reason why I got the Exception: "Cannot update record in the Chado "feature" table due to unset conditions. Record: Array " exception_tripal_lyout.txt

I've tested the update and it's now working for the first two genes but it's not working for the remaining genes after filling in the missing fields :( I got this error: Capture d’écran du 2024-03-14 09-31-14

I will soon load others GFFs and FASTAs, I will test if the genes entities informations will be displayed automatically, or if I'll have to manually enter the missing fields.

SalihaZ commented 6 months ago

I have another question: I am planning to load other entity types soon. Could Tripal Layout support entities such as mRNA?

dsenalik commented 6 months ago

There is a problem, I think in the widget ChadoSequenceCoordinatesWidgetDefault - hopefully I can fix this soon. I am new to Tripal Layout, but in theory it should support mRNA the same as Gene because they are both based on the chado feature table

Update: This has been fixed in core Tripal by merged PR #1861

dsenalik commented 6 months ago

I want to document some additional problems I am finding

  1. ChadoSequenceCoordinatesDefault field should be renamed to ChadoSequenceCoordinatesTypeDefault to follow the naming rules. But to do this an update function would be needed as this breaks an existing site.
  2. ChadoSequenceCoordinatesDefault doesn't load all of the values from the featureloc table: rank, is_fmax_partial, residue_info, locgroup, rank
  3. Trying to add those reveals that some columns do not have CV terms defined, e.g. is_fmax_partial
  4. And the ones that do define terms, may be non-existent terms e.g. local:fmin does not exist in the CV term table!
  5. ChadoSequenceCoordinatesWidgetDefault doesn't pass through the other columns from the featureloc table

Update: This has been fixed in core Tripal by merged PR #1861

dsenalik commented 6 months ago

I have a suggestion for a temporary workaround, just remove the sequence coordinates field from the Gene (or mRNA) content types for now, and see if the other problems persist. This is looking harder to fix than I thought. You can always add the field back later.

SalihaZ commented 6 months ago

OK. You mean these fields ? image

image

laceysanderson commented 6 months ago

I agree with @dsenalik that it's the coordinate field. You should only need to remove the Sequence coordinate field. This is best done on the manage fields page (admin/structure/bio_data/manage/gene/fields) and clicking the drop down beside the edit to find delete.

Screenshot 2024-03-14 at 10 34 31 AM

Tripal Layout does handle all content types and even custom ones, we just haven't made templates for the other content types yet but they should be coming soon! In the meantime you can actually setup the layout using field groups on their own as all Tripal Layout does is automate the process.

SalihaZ commented 6 months ago

Thanks @laceysanderson for your reply.

Actually the field Sequence Coordinates doesn't appear in the Manage fields's page, I didn't have to remove it. (In the list of existing field neither)

image

laceysanderson commented 6 months ago

Oooh, interesting. I guess that makes sense as the coordinates field was added after the most recent release. However, that means that the issue that Doug is seeing and your issue may be different 🙈

I would update your site to the development version of Tripal, delete the gene content type and then re-import it using the Tripal > Page Structure > Import Content Types and checking "Genomic". That will give you all the most recent field versions for the gene content type without deleting any of your imported data in chado. You may need to delete the genes listed under Tripal > Content first. Again this will not delete anything you imported in chado.

dsenalik commented 6 months ago

Notes for the issue of missing terms on chado columns, the following don't have terms defined (I think it may be time to make this a separate issue)

dsenalik commented 6 months ago

I built a new docker on the 4.x branch just now, imported genomic content field types, and deleted the sequence coordinates field. I am not getting any errors. The first gene as imported: 20240314_beforeedit

Then I edit, and save without changing anything: 20240314_afteredit

The only difference are some of the linker fields (which are all empty) and the synonym field (also empty) no longer are showing up.

SalihaZ commented 6 months ago

Hi, Thanks @laceysanderson and @dsenalik for your suggestions.

After a git pull on tripal, I have removed the gene content type and re-imported it but I still don't see the sequence_coordinate field.

The display is worse : Tripal Layout no longer displays any information, and this is the case for all genes (1514 pages) .

Capture d’écran du 2024-03-19 14-54-55

The mandatory fields are emty when I edit a gene

image

dsenalik commented 6 months ago

This is just a guess, but if you didn't already, can you try to re-publish the gene content type? at /admin/content/bio_data/publish

SalihaZ commented 6 months ago

I already did it and it didn't change anything :(

laceysanderson commented 6 months ago

This is very weird 🤔 I would recommend deleting all the current gene pages, running the updates for your site ([sitename].updates.php) and then republishing the genes. The thing you're seeing with the pages being empty implies the connection between Drupal and Chado for these pages got messed up somehow which I am not able to duplicate on recent versions.

If you have a large number of genes this can be done using drush php:cli and the following code:

// Load all Tripal Content Entities of type gene.
$entities = \Drupal::entityTypeManager()->getStorage('tripal_entity')->loadByProperties(['type' => 'gene']);
// Loop through each one and delete it. This does not delete anything from chado.
foreach ($entities as $entity) {
  $entity->delete();
}

Alternative, another option to to start a fresh site which I only mention on the off chance you are in the starting phase of site development where this is not a big deal.

SalihaZ commented 6 months ago

I've tried to remove all the gene pages following your suggestion @laceysanderson but I've got these warnings:

Capture d’écran du 2024-03-20 12-19-28

I can't understand the reason. Do you know where this could be coming from?

laceysanderson commented 6 months ago

I'm not able to duplicate that at all -both in a site with multiple genes and one with none 🤔 it almost implies there's a serious issue with your site.

What do you get when you run drush core-status?

pdtouch commented 6 months ago

In my installation, under Tripal -> Page Structure -> Gene -> Manage Form Display page, I found Sequence Coordinates was Disabled, I wonder if this is related to this issue.

SalihaZ commented 6 months ago

Hi @laceysanderson , this is what I get when running drush core-status. It's seem Ok I guess

Capture d’écran du 2024-03-22 09-54-40

SalihaZ commented 6 months ago

I've started a fresh site. Now I can see the sequence coordinate field but the issue is that after I load the fasta and GFF files I coudn't publish them

image

image

the job displays a running status but nothing happened

image

laceysanderson commented 4 months ago

Hi @SalihaZ it would be worth trying again now with the most recent changes to core. There have been a number of recent PRs merged that would impact and hopefully fix what you were seeing here.

SalihaZ commented 4 months ago

Hi @laceysanderson , I took back the first site I started before and I tried to remove the gene pages with the code you provided me using drush php:cli. It executed without error, however, the pages are deleted in batches. I was wondering if this code is supposed to delete everything at once? I have to delete more than 1300 pages ! Thanks in advance !

$entities = \Drupal::entityTypeManager()->getStorage('tripal_entity')->loadByProperties(['type' => 'gene']);
foreach ($entities as $entity) {
  $entity->delete();
}
laceysanderson commented 4 months ago

It is supposed to delete everything at once but maybe the Drupal API has a limit for how many entities it will load to stop timeouts and memory overlow issues 🤔 maybe edit it with a do while loop so it keeps going until there are no more entities to find?

Something like

do {
  $entities = \Drupal::entityTypeManager()->getStorage('tripal_entity')->loadByProperties(['type' => 'gene']);
  foreach ($entities as $entity) {
    $entity->delete();
  }
} while (!empty($entities));
SalihaZ commented 4 months ago

Hi @laceysanderson, I managed to delete all the pages. However, on this instance, I still cannot publish the genes. I also noticed that the sequence coordinates field is missing although I've deleted and re-imported the genomic collection type.

Capture d’écran du 2024-05-07 16-15-19

Is there a way to modify the database so that I can delete the tripal content types (organism, analysis, genes) without affecting the other contents. Actually, I have made good progress on the design aspect of the site (slideshow etc. ..

laceysanderson commented 4 months ago

Are you sure you are on the most recent version of Tripal on this site? Specifically, the ideal workflow is:

  1. use code above to delete specific content pages that are giving you issues. If an entire content type is giving you issues then delete the content type as well by going to Tripal > Page Structure and choosing delete in the drop down beside that content type. Remember you are deleting both the individual gene pages and also the entire concept of a gene from Tripal before you re-import things.
  2. pull most recent changes to tripal core for this site
  3. run the updates for the Drupal site if there are any
  4. re-import the content types (note: this will only affect the deleted content types, any that are still present will be skipped) confirm that all the fields you expect are now there. if not then confirm that tripal core was actually updated and you may need to delete the content type.
  5. since all of the data imported into chado (i.e. records created by the gff3 importer) are still there after deleting the content pages, you do not need to re-import but should just publish at this point)

Actually, I have made good progress on the design aspect of the site (slideshow etc. ..

This is great to hear!

SalihaZ commented 4 months ago

Thanks a lot @laceysanderson for this very detailed response. I followed the steps to the letter but I still don't see the Sequence Coordinates field of the gene type. I tried to add it manually but when I click on continue I got this error. When I've run the tripal publish, I've only the analysis and the organism displayed but not the genes, may be this is related to the missing sequence coordinates field?

[Mon May 13 10:13:36.108949 2024] [php:notice] [pid 3179:tid 140441956886272] [client 195.221.174.124:58840] Uncaught PHP Exception Exception: "Cannot create a StorageProperty object for entity type "tripal_entity" field type "chado_sequence_coordinates_default", key "is_fmin_partial" as accession for the property term is not recognized: "local:is_fmin_partial"" at /opt/www/palm-genome-hub.southgreen.fr/web/modules/tripal/tripal/src/TripalStorage/StoragePropertyBase.php line 77, referer: http://dev-palm-genome-hub.southgreen.fr/admin/structure/bio_data/manage/gene/fields/add-field Capture d’écran du 2024-05-13 09-53-40 Capture d’écran du 2024-05-13 09-54-07 Capture d’écran du 2024-05-13 09-52-52

dsenalik commented 4 months ago

That missing term was added with the most recent update, specifically 10404

Can you try running drush updatedb and report if it runs any updates? If it does, then please try again.

If not, confirm that this update function is in tripal_chado/tripal_chado.install line 475 to be sure you are up-to-date.

SalihaZ commented 4 months ago

Thanks for your reply @dsenalik . 1) I delete the gene content type 2) I run the updates The update seems to be OK image Capture d’écran du 2024-05-14 10-00-52 3) I imported again the gene content type; I still don't get the sequence coordinates field and l get the same error when trying to add it. Uncaught PHP Exception Exception: "Cannot create a StorageProperty object for entity type "tripal_entity" field type "chado_sequence_coordinates_default", key "is_fmin_partial" as accession for the property term is not recognized: "local:is_fmin_partial"" at /opt/www/palm-genome-hub.southgreen.fr/web/modules/tripal/tripal/src/TripalStorage/StoragePropertyBase.php line 77, referer: http://dev-palm-genome-hub.southgreen.fr/admin/structure/bio_data/manage/gene/fields/add-field