Closed Maxim-Karpov closed 1 year ago
thanks you. this is a great suggestion. I am trying to understand this better and see how we can do this in a non-ambiguous way.
what should happen if there are multiple replacements? are the replacements semantically the same (or split from the old terms, in which case the semantic meaning changes)
thanks you. this is a great suggestion. I am trying to understand this better and see how we can do this in a non-ambiguous way.
what should happen if there are multiple replacements? are the replacements semantically the same (or split from the old terms, in which case the semantic meaning changes)
Perhaps all of the available replacements could be substituted into the analysis. As far as I've seen, the replacements tend to be very similar to their obsolete categories. For example obsolete GO term "cleavage involved in rRNA processing GO:0000469" is replaced by "rRNA processing GO:0006364".
This seems to be more complicated than I thought as there are also consider
tags. Furthermore, some term replacements can be crude simplifications/abstractions of the originals e.g. "obsolete chaperonin ATPase activity GO:0003763" is replaced by "ATP hydrolysis activity GO:0016887". Given that only 1 replacement term is ever available per obsolete id, it is arguably justifiable to simply replace them in the analysis.
Here's the code to extract all ids, replacements, and considerations for all obsolete entries from the obo file FYI (credit: @iquasere):
awk 'BEGIN {print "id\treplaced_by\tconsider"}
/^\[Term\]/{if(is_obsolete) print id"\t"replaced_by"\t"consider; is_obsolete=id=replaced_by=consider=""; next}
/^id:/{id=$2}
/^is_obsolete: true/{is_obsolete=1}
/^replaced_by:/{replaced_by=replaced_by ? replaced_by";"$2 : $2}
/^consider:/{consider=consider ? consider";"$2 : $2}' go-basic.obo > go-basic.tsv
@Maxim-Karpov
Thank you for the deep dive on this.
I'll attempt a fix this weekend - perhaps bringing in both replaced_by
and consider
.
@Maxim-Karpov
I just had a commit adding an option --obsolete
to find_enrichment.py
.
--obsolete {keep,replace,skip}
Strategy for handling obsolete GO terms (default: skip)
The replace
strategy updates the obsolete GO term with terms suggested in replaced_by
and consider
. Please note that the default behavior stays the same, which is to skip the obsolete terms.
Thank you again for the great idea - and please let me know if there's an issue.
Hello @tanghaibao, I found the same problem mentioned in this issue! I have downloaded the last version of the gooatools and find_enrichment.py -h
did not show me the --obsolete
option. Do you have any idea?
@dieunelderilus
Did you try updating the goatools? pip install -U goatools
.
Also the latest version as of today:
python -c "import goatools; print(goatools.__version__)"
1.4.11
should have the --obsolete
option:
find_enrichment.py -h | grep obsolete
--obsolete {keep,replace,skip}
Strategy for handling obsolete GO terms (default:
Hello, I've realised that there may be a potential bug with the enrichment tool where an entry in the obo is considered obsolete. For example in the following entries:
WARNING: GO:0000469 NOT FOUND IN DAG WARNING: GO:0006379 NOT FOUND IN DAG WARNING: GO:0010862 NOT FOUND IN DAG WARNING: GO:0014065 NOT FOUND IN DAG WARNING: GO:0014066 NOT FOUND IN DAG WARNING: GO:0016307 NOT FOUND IN DAG WARNING: GO:0030579 NOT FOUND IN DAG WARNING: GO:0031532 NOT FOUND IN DAG WARNING: GO:0035551 NOT FOUND IN DAG WARNING: GO:0042779 NOT FOUND IN DAG WARNING: GO:0043046 NOT FOUND IN DAG WARNING: GO:0043629 NOT FOUND IN DAG WARNING: GO:0043631 NOT FOUND IN DAG WARNING: GO:0047690 NOT FOUND IN DAG WARNING: GO:0048017 NOT FOUND IN DAG WARNING: GO:0061088 NOT FOUND IN DAG WARNING: GO:0070084 NOT FOUND IN DAG WARNING: GO:0090502 NOT FOUND IN DAG WARNING: GO:0098789 NOT FOUND IN DAG WARNING: GO:0102176 NOT FOUND IN DAG WARNING: GO:0102756 NOT FOUND IN DAG WARNING: GO:1903204 NOT FOUND IN DAG
These have been replaced by a different GO term but the goatools considers them as absent. It would be nice if the program replaced these terms for the user (if the replacement is present), or counted them in regardless of the obsolete status (an option for this).