Closed bradfordcondon closed 6 years ago
Looks like we want to delete the old one since it has less info. How do we go about doing that? Anything we need to check before deleting an analysis (like if there are features associated with)?
we can archive the old one if t's truly a different assembly version. Before doing so, we should ensure there are no features assocated with it.
similarly red oak has 3 transcriptomes. On the organism page the labels are correct:
https://www.hardwoodgenomics.org/organism/Quercus/rubra?tripal_pane=group_transcriptome
however in the Tripal view:
why arent the titles [archived] ?
if you edit the content, its [archived].
I should update the instructions to say to include [archived] at the start of a name if its an older one...
I think its clear this analysis is the correct one: https://www.hardwoodgenomics.org/Genome-assembly/2209485?tripal_pane=group_downloads
This one: https://www.hardwoodgenomics.org/Genome-assembly/1963051?tripal_pane=group_downloads
has deadlinks, etc.
are any features linked to it? if no then we can just delete and/or archive.
select count(*) from chado.analysisfeature where analysis_id = 125;
220340
select count(*) from chado.analysisfeature where analysis_id = 150;
64992
select count(*) from chado.analysisfeature where analysis_id = 191;
0
records: 125, 150, 191. 125 and 150 look almost identical in the analysis table.
select * from chado_bio_data_21;
mapping_id entity_id record_id nid
1 1962952 17
2 1962953 51
3 1962958 50
4 1963052 151
5 1963053 125
6 1963056 146
7 1963051 150
8 1919890 152
9 1963058 154
10 2161592 157
11 2209433 161
12 2209485 191
Note all three are genome assemblies.
125 and 150 are identical, but 150 is formatted better. Why does 125 have any features at all?
ALL THREE are bioproject 291087. this means that the best and correct course of action is to transfr all features to analysis 191 and delete analysis 125 and 150.
analysis foreign keys to worry about:
quantification, project_a , phylotree, nd_experiment_a, aprop, afeature, a_relationship, a_pub, a_dbxref,a_cvterm.
I think only afeature is of relevence.
checking like so:
select count(*) from chado.quantification where analysis_id in(125, 150, 191);
Only analysisprop and analysisfeature
select count(*) from chado.analysisprop where analysis_id in(125, 150, 191);
9
Here are the props:
hardwoods_06112018=> select * from chado.analysisprop ap INNER JOIN chado.cvterm cvt ON cvt.cvterm_id = ap.type_id where analysis_id in(125, 150, 191);
analysisprop_id analysis_id type_id value rank cvterm_id cv_idname definition dbxref_id is_obsolete is_relationshiptype
4448 125 2005 Juglans_regia_01182017 0 2005 6 analysis_unigene_name The name for a unigene. 2482 0 0
4449 125 2006 0 2006 6 analysis_unigene_num_contigs The number of contigs in the unigene assembly 2483 0 0
4450 125 2009 0 2009 6 analysis_unigene_num_reads The number of reads, after filtering, used as input for the assembly 2486 0 0
4451 125 2010 0 2010 6 analysis_unigene_avg_length The average contig length 2487 0 0
4452 125 2008 0 2008 6 analysis_unigene_num_clusters The number of clusters in the unigene assembly 2485 0 0
4453 125 2007 0 2007 6 analysis_unigene_num_singlets The number of singlets remaining in the unigene assembly 2484 00
4489 125 2063 tripal_analysis_unigene 0 2063 16 Analysis Type The type of analysis was performed. 2540 0 0
4525 125 29 genome_assembly 0 29 15 analysis_typeThe type of analysis was performed. This value is automatically set by each Tripal Analysis module and should be equal to the module name (e.g. tripal_analysis_blast, tripal_analysis_go). 29 0 0
4530 150 29 genome_assembly 0 29 15 analysis_typeThe type of analysis was performed. This value is automatically set by each Tripal Analysis module and should be equal to the module name (e.g. tripal_analysis_blast, tripal_analysis_go). 29 0 0
(9 rows)
Something is very wrong.
analysis 125 has 16852 genes and mRNA. analysis 150 has 32496 mRNA and polypeptide. 125 also has the supercontigs: 186636 of them.
select analysis_id, type_id, cvt.name, count(type_id) from chado.analysisfeature inner join chado.feature on feature.feature_id = analysisfeature.feature_id INNER JOIN chado.cvterm cvt ON cvt.cvterm_id = feature.type_id where analysis_id in (125, 150, 191) group by cvt.name, analysis_id, type_id;
analysis_id type_id name count
125 215 gene 16852
125 145 mRNA 16852
150 145 mRNA 32496
150 236 polypeptide 32496
125 290 supercontig 186636
matt has re-annotated the IPS. THey are in /var/www/html/sites/default/files/IPS_aug_17_2018
What about blast? I'm guessing we didn't do that.
Yeah we are gonna need blast too. Since ACF is gonna be down, is it possible to use Staton server? Or is that also going through prepping the new drives?
Let's start loading the ips files at least. Did we already delete the old analyses?
no... we can delete the two older analyses, keep the one with no records, and reload pointing to that one.
so: delete the two below analysis entities and chado records:
analysis_id - 125 ---> https://hardwoodgenomics.org/bio_data/1963053
its real easy. delete teh two anlayes i say to delete in the issue, delete all features
reload, associating with the third not-deleted analysis
Steps:
Oh I need to somehow find and delete the old mRNA entities. Since Tripal doesn't provide an unpublish
method yet. We need to figure out a way to identify them now since the actual features are deleted from chado so there is no association to an organism. I think a simple LIKE
query can return those entities but since we also want trigger delete hooks so they are removed from whatever index uses them, I am gonna write a little script to do it.
Ok I am adding a new very simple but super useful feature to tripal alchemist to clear orphaned entities, entities that have no associated chado records. This feature, if good enough can then be ported over to the main tripal repo.
Publishing mRNAs job: https://hardwoodgenomics.org/admin/tripal/tripal_jobs/view/443269
This issue on tripal_manage_analysis
will fix the fields pointing to both deleted analyses: https://github.com/statonlab/tripal_manage_analyses/issues/42
So checking the files in /var/www/html/sites/default/files/IPS_aug_17_2018
looks like those are files for the entire site. We only want those for j regia. Have those been added to the server?
Ok IPR XMLs are now at: sites/default/files/sequences/englishWalnut01182017/IPR
The xml files have the names abbreviated from Juglans_regia_01182017_WALNUT_00003300-RA_mRNA
to WALNUT_00003300-RA
. I'll run sed
on them to add the missing part. This will keep JBrowse happy too.
IPR is done and correctly linked:
So now we are only missing the BLAST for this. Let's create a separate issue though with a more related title.
The main issue of having 3 analyses is completed!
Bug description
https://www.hardwoodgenomics.org/organism/Juglans/regia?tripal_pane=group_reference_genome
not a bug necessarily, but certainly a casuality of #345