Closed martacds closed 5 years ago
Hello @marcsilvaitqb
Did you try selecting a sequence type for your genes? Your example Gene page suggests that you loaded your sequences as gene
.
Thanks!
Hi @almasaeed2010
When uploading the genomic fasta file I selected region as the sequence type because my file is composed of scaffolding sequences and in a previous upload that is what was used. But now for the expression data I selected gene as the sequence type, considering that that is the content of my matrix files.
Should they match?
Thank you!
I believe they should match. Selecting a sequence type in the expression loader only helps the loader identify the gene and does not alter the type.
Ok, thank you! I'm going to try it with region instead and will update you on the result.
Tried uploading the expression data with the sequence type region (equal to the fasta file) and it still shows the same error ERROR: The feature, LOC111983025, found in the expression file was not found in the Chado database. Please ensure that the feature has been loaded into the database and that the feature name is both unique and correct.
Tried again with the different matrix files and name/unique name. Still the same error.
One more thing to check, which option did you select for Name Match Type
?
Also if you have access to the command line, can you run this?
drush sqlc
Then run this query:
select cvt.name, f.name, f.uniquename from chado.feature f
inner join chado.cvterm cvt on cvt.cvterm_id = f.type_id
where f.name = 'LOC111983025' or f.uniquename = 'LOC111983025';
One more thing to check, which option did you select for Name Match Type?
I have tried with both name and unique name.
Then run this query:
This was the result
Ok so according the results from the query, you need to select Name
for Name Match type and use gene in Sequence Type
. Did you try that combo?
If that still doesn't work, please share your matrix file so we can further look into it.
Yes, I have used that combo.
My matrix files look like this (opened in Excel): With geneXXXX
With LOCXXX
From what I can tell, you need to specify Name
for and Name Match Type when using the LOCXXX file and Uniquename
when using the geneXXX file. Both files should use gene
as a Sequence Type
.
I looked at the code and those are the only 2 conditions that must match. Please verify that this is what you did. If it still doesn't work, I'll need to be able to download both the Fasta file and the matrix files to into it.
Yes, I have tried both of those combinations.
The fasta file is here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/906/115/GCF_002906115.1_CorkOak1.0/GCF_002906115.1_CorkOak1.0_genomic.fna.gz
The gff here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/906/115/GCF_002906115.1_CorkOak1.0/GCF_002906115.1_CorkOak1.0_genomic.gff.gz
And the matrix file with geneXXX: countsERR490.txt
Thanks! I'll try to look into those later today and get back to you.
Thank you so much!
Hi @almasaeed2010 Did you get a chance to check the files?
Thank you
Hello @marcsilvaitqb
Sorry it's taking a little longer to debug this issue. The files are large and are taking a while to load into my site. I'll try to trim them down then try again.
Thanks
Oh ok, no problem!
I tried trimming the files into a single feature each to make it fast and it worked for me as you can see below:
It's likely that there is a mismatch in the options somewhere. I would try again making sure of all of the following:
geneXXXX
for the countsgene
as the sequence typeUniquename
for the name typeSo far the only thing we have not verified in the past is the organism so could you run this query?
select o.genus, o.species, cvt.name, f.name, f.uniquename from chado.feature f
inner join chado.cvterm cvt on cvt.cvterm_id = f.type_id
inner join chado.organism o on o.organism_id = f.organism_id
where f.name = 'LOC111983025' or f.uniquename = 'LOC111983025';
I have ran the query and the output is: Quercus suber gene LOC111983025 gene27663
Thanks for running the query! I hope it works this time when selecting all the parameters
So I just ran for the first time the whole thing to the end (I always cancelled halfway through) and in the end it shows the following error:
SQLSTATE[23503]: Foreign key violation: 7 ERROR: insert or update on table "element" violates foreign key constraint "element_feature_id_fkey" DETAIL: Key (feature_id)=(0) is not present in table "feature". [site http://default] [TRIPAL ERROR] [TRIPAL_JOB] SQLSTATE[23503]: Foreign key violation: 7 ERROR: insert or update on table "element" violates foreign key constraint "element_feature_id_fkey"DETAIL: Key (feature_id)=(0) is not present in table "feature".
Is this relevant?
(I have also tried again that combination and again the same error of feature not found)
I've uploaded a new fix to address the error you've encountered. I also adjusted the code that checks if a feature is available. Could you please update the module and try again?
Thanks for your patience!
Sorry for the basic question but I'm not the one that installed the module, so I'm a bit lost. How do I update it?
No problem!
You can navigate to the module directory from the drupal's root installation: cd sites/all/modules/tripal_analysis_expression
then run git pull && drush updatedb
Thank you so much!
Will update and re-try, and then update you on the outcome.
Hi, This is what shows up now:
Options: geneXXXX, organism: quercus suber, sequence type: gene, file type: matrix, name match type: unique name
is gene4
the only feature showing this error?
No. I just created a mini version of the matrix file with only gene 3 and gene 4, and they both show the error.
I think I found the error this time! Could you please update the code and try again? It looks like it was checking name
no matter what you chose for name match type.
i am currently running it until the end with the full matrix file. So far two things show up:
but they don't show up for all genes, it is skipping a few.
As soon as the loader finished I'll update again.
Can you provide your system information? Just to make sure we are using the same APIs.
Tripal version Drupal version PHP version
For drupal and php you can visit domain.org/admin/reports/status
. For tripal, the version should be in the modules page.
Drupal 7.65 PHP 7.2.17-0ubuntu0.18.04.1 Tripal 7.x-3.1
I think that the Tripal Warnings might be due to the fact that some features can be recognized as pseudogenes. After this finishes, I will create the content pseudogenes and see if the same message appears.
EDIT: Adding the Tripal Content Type "pseudogene" did not fix the previous messages.
Since the warning is not showing up for all features, let's check these 2 particular genes to make sure they have the right info in the database:
select o.genus, o.species, cvt.name, f.name, f.uniquename from chado.feature f
inner join chado.cvterm cvt on cvt.cvterm_id = f.type_id
inner join chado.organism o on o.organism_id = f.organism_id
where f.name in ('gene837', 'gene846') or f.uniquenename in ('gene837', 'gene846');
This is what shows up:
But I have added the tripal content type pseudogene:
Ok this makes sense. Since you specified the Sequence Type
to be gene
the importer will only look for features that have the type gene
and not pseudogene
. The loader expects a new matrix file for each type of feature. I didn't design this module so I am not entirely sure why this restriction is required but I can look into it on Wendesday when I have a meeting with the other developers.
So in theory if I upload the exact same matrix file but then select pseudogene
as the sequence type
, the data will be loaded to those features and output an error for the gene
data?
I was working on the assumption that the unique name was recognized regardless so I didn't bother dividing the features.
I'm only interested in the genes right now, so I have all I need for now. Thank you so much for your time and patience!
Your theory is correct.
And anytime!
Since this issue resulted in fixing a bug, I'll add you as a contributor for bug reporting. Thanks!
@all-contributors add @marcsilvaitqb for bug
@almasaeed2010
I could not determine your intention.
Basic usage: @all-contributors please add @jakebolam for code, doc and infra
For other usages see the documentation
Does it want please? 😅
@all-contributors please add @marcsilvaitqb for bug
@almasaeed2010
I've put up a pull request to add @marcsilvaitqb! :tada:
Hi! I have uploaded the FASTA files and the gff necessary and I have published both the genes and mRNA. They show up correctly as Tripal content
This is an example of one of the genes:
I have uploaded the BioSample xmls and am now trying to upload the expression data as a matrix file but it is giving me the following error:
ERROR: The feature, LOC111983044, found in the expression file was not found in the Chado database. Please ensure that the feature has been loaded into the database and that the feature name is both unique and correct.
I have tried with both options as name and unique name. I have also tried two different matrix files where the features are loaded as either LOCXXX or geneXXXX.
What am I missing?
Thank you in advance!