pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

exists_during annotation (convert this output to a log file) #905

Closed ValWood closed 2 years ago

ValWood commented 4 years ago

I will probably look at the non-phase ones later but will need a reduced list

exists_during.txt

ValWood commented 4 years ago

@kimrutherford

please could you redo me the query to get the list above (all exists_during extensions)

but this time, exclude any which are in this branch of GO? GO:0022403 cell cycle phase

kimrutherford commented 4 years ago

Did I create that file (exists_during.txt) as part of an issue? Do you have the issue number?

ValWood commented 4 years ago

I think I made that file from the GO GAF...

kimrutherford commented 4 years ago

Ah, OK. I'll was going to re-use my previous query if there was one.

No problem, I'll have a go at a Chado query for it.

kimrutherford commented 4 years ago

Does this have what you need?: exists_during_annotations.txt

Note to self, here's the query:

select f.uniquename, t.name
  from feature_cvterm fc join feature f on f.feature_id = fc.feature_id
  join cvterm t on t.cvterm_id = fc.cvterm_id
 where t.name like '%[exists_during]%' order by t.name;
ValWood commented 4 years ago

Perfect. It does not omit the ones in the cell cycle phase branch, but because I can see the terms I can easily remove those manually in a couple of minutes.

ValWood commented 4 years ago
kimrutherford commented 4 years ago

Would it help to know which curation session each annotation comes from?

ValWood commented 4 years ago

Yes but I was going to ask if these could be migrated automatically. Is that possibe/easy. It will take a long time to change these. Most will need to be looked at, or it will be as quick to do manaually.

ValWood commented 4 years ago

If so I will open a chado ticket to deal with this.

kimrutherford commented 4 years ago

Yes but I was going to ask if these could be migrated automatically.

Migrated to a different extension? We can do that automatically

If so I will open a chado ticket to deal with this.

It's a Canto issue really.

ValWood commented 4 years ago

automatic migration moved to https://github.com/pombase/canto/issues/2141

ValWood commented 4 years ago

If you could rerun this query providing the session links.

ValWood commented 4 years ago

Is it possible to rerun this query providing the session links?

ValWood commented 4 years ago

and excluding any descendants of "GO:0022403 cell cycle phase" these ones I need to fix are a subset of this reduced list

ValWood commented 4 years ago

I can fix these now

@kimrutherford don't do this yet. I will fix these few first.

ValWood commented 3 years ago

I think I have done most of what needed doing here. @kimrutherford could you run the script above again and excluding any descendants of "GO:0022403 cell cycle phase" so I can see what is left.

No hurry.

kimrutherford commented 3 years ago

so I can see what is left.

There are quite a few: exists_during_terms_2.txt


Note to self:

select distinct f.uniquename, t.name
  from feature_cvterm fc join feature f on f.feature_id = fc.feature_id
  join cvterm t on t.cvterm_id = fc.cvterm_id
  join cvterm_relationship ext_rel on ext_rel.subject_id = t.cvterm_id
  join cvterm ext_rel_type on ext_rel.type_id = ext_rel_type.cvterm_id
  join cvterm obj on ext_rel.object_id = obj.cvterm_id
 where ext_rel_type.name = 'exists_during'
   and obj.cvterm_id not in
       (select subject_id from cvtermpath p
            join cvterm o on p.object_id = o.cvterm_id
            join cvterm pt on p.type_id = pt.cvterm_id
            where pt.name = 'is_a' and o.name = 'cell cycle phase')
order by t.name;
ValWood commented 3 years ago

I think the original script also excluded "response to" terms. So eyeballing it looks like most are now "phases" or "response to".

ValWood commented 3 years ago

To fix to phases, responses This will be more consistent. For example the first one has

Having all as phases would be better. I might leave a few (for instance I allowed phase transitions, since although they are not in the phase branch they describe time points when used as extensions)

kimrutherford commented 3 years ago

The results from the SQL I included in the comment (https://github.com/pombase/curation/issues/2535#issuecomment-558481167) includes "response to" terms. Would you like me to exclude them?

ValWood commented 3 years ago

No it's OK, I have my list now ;) Assigned back to me.

kimrutherford commented 2 years ago

I've added that. Look out for a log file ending in .exists_during_extensions_not_in_cell_cycle_phase

ValWood commented 2 years ago

both queries should not include:

GO:0044848 biological phase (which includes cell cycle phase and "single-celled organism vegetative growth phase") AND "GO:0050896 response to stimulus"

kimrutherford commented 2 years ago

How's this?:

exists_during.txt


Note to self:

select distinct f.uniquename, t.name
  from feature_cvterm fc join feature f on f.feature_id = fc.feature_id
  join cvterm t on t.cvterm_id = fc.cvterm_id
  join cvterm_relationship ext_rel on ext_rel.subject_id = t.cvterm_id
  join cvterm ext_rel_type on ext_rel.type_id = ext_rel_type.cvterm_id
  join cvterm obj on ext_rel.object_id = obj.cvterm_id
 where ext_rel_type.name = 'exists_during'
   and obj.cvterm_id not in
       (select subject_id from cvtermpath p
            join cvterm o on p.object_id = o.cvterm_id
            join cvterm pt on p.type_id = pt.cvterm_id
            where pt.name = 'is_a' and (o.name = 'biological phase' or o.name = 'response to stimulus'))
order by t.name;
ValWood commented 2 years ago

This looks correct. You can close this. There will be more filters in the future but I will open a new ticket.

kimrutherford commented 2 years ago

I'll close it after I've arranged for it to go into a log file.

ValWood commented 2 years ago

I see the log(s).

Currently the 'phases' are filtered, but not the "response to" terms

kimrutherford commented 2 years ago

This should be fixed now.

ValWood commented 2 years ago

These logs look good. I may add additional filters (new ticket). FOr now I need to fix quite a free to make them either 'part_of' or 'phase' terms so I will do this first.